SET PERSIST in MySQL: A Small Thing for Setting System Variable Values


To set correct system variable values is the essential step to get the correct server behavior against the workload. SET PERSIST in MySQL
In MySQL, we have many System variables that can be changed at runtime, and most of them can be set at the session or global level.

To change the value of a system variable at the global level in the past, users needed to have SUPER privileges. Once the system variable value is modified as global, the server will change this behavior for the session, and obviously as global scope.

For instance, one of the most commonly adjusted variables is probably max_connections.

If you have max_connection=100 in your my.cnf or as the default value, and during the day as DBA you notice that it is not enough, it is easy just to add new connections on the fly with the command:


This will do the work.

But here is the issue. We had changed a GLOBAL value, that applies to the whole server, but this change is ephemeral and if the server restarts, the setting is lost. In the past, I have seen millions of times servers with different configurations between my.cnf and Current Server settings. To prevent this, or at least keep it under control, good DBAs had developed scripts to check if and where the differences exist and fix them. The main issue is that very often, we forget to update the configuration file while doing the changes, or we do it on purpose to do “Fine-tuning first” and forgot afterward.

What’s new in MySQL8 about that?

Well, we have a couple of small changes. First of all the privileges, as for MySQL8 users can have SYSTEM_VARIABLES_ADMIN or SUPER to modify the GLOBAL system variables. Also, the ability to have GLOBAL changes to variable to PERSIST on disk and finally, to know who did it and when.

The new option for SET command is PERSIST

So, if I have:

(root@localhost) [(none)]>show global variables like 'max_connections';
| Variable_name   | Value |
| max_connections | 1500  |

and I want to change the value of max_connection and be sure this value is reloaded at the restart, I will do this:

(root@localhost) [(none)]>set PERSIST max_connections=150;

(root@localhost) [(none)]>show global variables like 'max_connections';
| Variable_name   | Value |
| max_connections | 150   |

With the usage of PERSIST, MySQL will write the information related to:

– key (variable name)
– value
– timestamp (including microseconds)
– user
– host

A new file in the data directory: mysqld-auto.cnf contains the information. The file is in Json format and will have the following:

{ "Version" : 1 , "mysql_server" : {
 "max_connections" : { 
"Value" : "150" , "Metadata" : {
 "Timestamp" : 1565363808318848 , "User" : "root" , "Host" : "localhost" 
} } } }

The information is also in Performance Schema:

    from performance_schema.variables_info a 
        join performance_schema.global_variables b 
    where b.VARIABLE_NAME like 'max_connections'\G

*************************** 1. row ***************************
 VARIABLE_NAME: max_connections
VARIABLE_value: 150
      SET_TIME: 2019-08-09 11:16:48.318989
      SET_USER: root
      SET_HOST: localhost

As you see, it reports who did the change, from where, when, and the value. Unfortunately, there is no history here, but this can be easily implemented.

To clear the PERSIST settings, run RESET PERSIST and all the Persistent setting will be removed.

If you have:

{ "Version" : 1 , "mysql_server" : {
  "max_connections" : { "Value" : "151" , "Metadata" : { "Timestamp" : 1565367524946371 , "User" : "root" , "Host" : "localhost" } } , 
  "wait_timeout" : { "Value" : "151" , "Metadata" : { "Timestamp" : 1565368154036275 , "User" : "root" , "Host" : "localhost" } } 
} }


{ "Version" : 1 , "mysql_server" : {  }

Which is removing ALL THE SETTINGS, not just one.

Anyhow, why is this a good thing to have?

First, because we have no excuse now when we change a variable, as we have all the tools needed to make sure we will have it up at startup if this is the intention of the change.

Second, it is good because storing the information in a file, and not only showing it from PS, allows us to include such information in any automation tool we have. This is in case we decide to take action or just to keep track of it, like comparison with my.cnf and fixing the discrepancies automatically also at service down or when cloning. On this let me say that WHILE you can change the value in the file mysqld-auto.cnf, have the server at restart use that value as the valid one.

This is not recommended, instead please fix my.cnf and remove the information related to PERSIST.

To touch that file is also dangerous because if you do stupid things like removing a double quote or in any way affecting the Json format, the server will not start, but there will be NO error in the log.

{ "Version" : 1 , "mysql_server" : { "wait_timeout": { "Value : "150" , "Metadata" : { "Timestamp" : 1565455891278414, "User" : "root" , "Host" : "localhost" } } } }
                                                           ^^^ missing double quote

tusa@tusa-dev:/opt/mysql_templates/mysql-8.0.17futex$ ps aux|grep 8113
tusa      8119  0.0  0.0  14224   896 pts/1    S+   12:54   0:00 grep --color=auto 8113
[1]+  Exit 1                  bin/mysqld --defaults-file=./my.cnf

I have opened a bug for this (

A short deep dive in the code (you can jump it if you don’t care)

The new feature is handled in the files <source>/sql/persisted_variable.(h/cc). The new structure dealing with the PERSIST actions is:

struct st_persist_var {
  std::string key;
  std::string value;
  ulonglong timestamp;
  std::string user;
  std::string host;
  bool is_null;
  st_persist_var(THD *thd);
  st_persist_var(const std::string key, const std::string value,
                 const ulonglong timestamp, const std::string user,
                 const std::string host, const bool is_null);

And the main steps are in the constructors st_persist_var. It should be noted that when creating the timestamp, the code is generating a value that is NOT fully compatible with the MySQL functions FROM_UNIXTIME.

The code assigning the timestamp value pass/assign also passes the microseconds from the timeval (tv) structure:

st_persist_var::st_persist_var(THD *thd) {
  timeval tv = thd->query_start_timeval_trunc(DATETIME_MAX_DECIMALS);
  timestamp = tv.tv_sec * 1000000ULL + tv.tv_usec;
  user = thd->security_context()->user().str;
  host = thd->security_context()->host().str;
  is_null = false;


tv.tv_sec = 1565267482
    tv.tc_usec = 692276

will generate:
timestamp = 1565267482692276

This TIMESTAMP is not valid in MySQL and cannot be read from the time functions, while the segment related to tv.tv_sec = 1565267482 works perfectly.

(root@localhost) [(none)]>select FROM_UNIXTIME(1565267482);
| FROM_UNIXTIME(1565267482) |
| 2019-08-08 08:31:22       |

(root@localhost) [(none)]>select FROM_UNIXTIME(1565267482692276);
| FROM_UNIXTIME(1565267482692276) |
| NULL                            |

This because the timestamp with microseconds is formatted differently in MySQL :
PERSIST_code = 1565267482692276
MySQL = 1565267482.692276

If I run: select FROM_UNIXTIME(1565267482.692276);

I get the right result:

| FROM_UNIXTIME(1565267482.692276) |
| 2019-08-08 08:31:22.692276       |

And of course, I can use the trick:

select FROM_UNIXTIME(1565267482692276/1000000);
| FROM_UNIXTIME(1565267482692276/1000000) |
| 2019-08-08 08:31:22.6922                |

Well, that’s all for the behind the scene info.  Keep this in mind if you want to deal with the value coming from the Json file.

SET PERSIST Conclusion

Sometimes the small things can be better than the HUGE shiny things. Many times I saw DBAs in trouble because they do not have this simple feature in MySQL, and many MySQL fails to start or doesn’t behave as expected. Given that, I welcome SET PERSIST and I am sure that the people who manage thousands of servers, with different workloads and automation in place, will see this as a good thing as well.



MongoDB: Impact-free Index Builds using Detached ReplicaSet Nodes

MongoDB Impact-free Index Builds

MongoDB Impact-free Index BuildsCreating an index on a MongoDB collection is simple; just run the command CreateIndex and that’s all there is to it. There are several index types available, and in a previous post, you can find the more important index types: MongoDB index types and explain().

The command is quite simple, but for MongoDB, building an index is probably the most complicated task. I’m going to explain what the potential issues are and the best way to create any kind of index on a Replica Set environment.

A Replica Set is a cluster of mongod servers, at least 3, where the complete database is replicated. Some of the main advantages of this kind of structure are automatic failover and read scalability. If you need more familiarity with Replica Set, you may take a look at the following posts:

Deploy a MongoDB Replica Set with Transport Encryption (Part 1)

Deploy a MongoDB Replica Set with Transport Encryption (Part 2)

Deploy a MongoDB Replica Set with Transport Encryption (Part 3)

Create Index Impact

As said, creating an index for MongoDB has really a severe impact. A simple index creation on a field like the following blocks all other operations on the database:

db.people.Createindex( { email: 1 } )

This could be ok for a very small collection, let’s say where the building will take a few milliseconds. But for larger collections, this is absolutely forbidden.

We call this way of building an index the “foreground” mode.

The foreground index creation is the fastest way, but since it is blocking we have to use something different in the production environments. Fortunately, we can also create an index in “background” mode. We may use the following command:

db.people.CreateIndex( { email : 1}, { background: true } )

The index will be built in the background by mongod using a different incremental approach. The advantage is that the database can continue to operate normally without any lock. Unfortunately, background creation takes much longer than the foreground build.

The first hint is then to create the indexes using the background option. This is OK, but not in all the cases. More on that later.

Another impact when building an index is memory usage. MongoDB uses, by default,  up to 500MB of memory for building the index, but you can override it if the index is larger. The larger the index, the higher will be the impact if you don’t have the capability to assign more memory for the task.

To increase the amount of memory for index builds, set the following in the configuration file:

maxIndexBuildMemoryUsageMegabytes: 1024

Example: set it for 1 GB.

Create Index on a Replica Set

As long as the index creation command is replicated on all the nodes of the cluster in the same way all the other write commands are replicated. the index creation is replicated on a Replica Set cluster. A foreground creation on the PRIMARY is replicated as foreground on SECONDARY nodes. A background creation is replicated as background on SECONDARY nodes as well.

The same limitation applies for the Replica Set as the standalone server. The foreground build is fast but blocking and the background build is not blocking, but it is significantly slower for very large collections.

So, what can we do?

If you need to create a small index, let’s say the size is less than the available memory, you can rely on the background creation on the PRIMARY node. The operation will be correctly replicated to the SECONDARY nodes and the overall impact won’t be too bad.

But if you need to create an index larger than the memory, on a huge collection, then even the background build is bad. The creation will have a significant impact on the server resources and you can get overall performance problems on all the nodes. In this case, we have to follow another procedure. The procedure requires more manual steps, but it’s the only way to properly build such a large index.

The idea is to detach from the Replica Set one node at the time, create the index, and rejoin the node to the cluster. But first, you need to take care of the oplog size. The oplog window should be large enough to give you the time for the index build when a node is detached.

Note: the oplog window is the timestamp difference between the first entry in the oplog and the more recent one. It represents the maximum amount of time you can have a node detached from the cluster for any kind of task (software upgrades, index builds, backups). If you rejoin the node inside the window, the node will be able to catch up with the PRIMARY just executing the missing operations from the oplog. If you rejoin the node after the window, it will have to copy completely all the collections. This will take a long time and an impressive bandwidth usage for large deployments. 

The following is the step by step guide:

  • choose one of the SECONDARY nodes
  • detach the node from the Replica Set
    • comment into the configuration file the replSetName and the port options
    • set a different port number
    • set the parameter disableLogicalSessionCacheRefresh to true
   bindIp: localhost,
   port: 27777
#  port: 27017
#   replSetName: RSCluster
   disableLogicalSessionCacheRefresh: true

    • restart mongod
    • now the server is running as standalone; any query won’t be replicated
  • connect using the alternative port and build the index in foreground mode
db.people.CreateIndex( { email: 1} )

  • connect the node to the Replica Set
    • uncomment the options in the configuration file
    • remove the disableLogicalSessionCacheRefresh option
    • restart mongod
    • now the node is a member of the Replica Set
    • wait some time for the node to catch up with the PRIMARY
  • repeat the previous steps for all the remaining SECONDARY nodes
  • stepdown the PRIMARY node to force an election
    • run rs.stepDown() command. This forces an election. Wait for some time for the PRIMARY to become a SECONDARY node.
  • restart it as standalone
    • use the same procedure we saw before
  • build the index in foreground mode
  • restart the node and connect it to the Replica Set

That’s all. We have created the index on all the nodes without any impact for the cluster and for the production applications.

Note: when restarting a node as standalone, the node could be exposed to mistake writes. For the sake of security, a good practice could be to disable TCP connections, allowing only local connections using the socket. Then you can put into the configuration file for example:
bindIp: /tmp/mongod.sock


This procedure is definitely more complicated than running a single command. It will require some time, but we hope you don’t have to create such large indexes every single day. ?


MySQL: Disk Space Exhaustion for Implicit Temporary Tables

Implicit Temporary Tables

Implicit Temporary TablesI was recently faced with a real issue about completely exhausting the disk space on MySQL. This was a serious issue because of the continuous outages of the service, as the customer had to constantly restart the server and wait for the next outage.

What was happening? In this article, I’m going to explain it and propose solutions.

Implicit temporary tables

MySQL needs to create implicit temporary tables for solving some kinds of queries. The queries that require a sorting stage most of the time need to rely on a temporary table. For example, when you use GROUP BY, ORDER BY or DISTINCT.  Such queries are executed in two stages: the first is to gather the data and put them into a temporary table, the second is to execute the sorting on the temporary table.

A temporary table is also needed in case of some UNION statements evaluation, for VIEW that cannot use merge, for derived tables when using subqueries, for multiple-table UPDATE, and some other cases.

If the temporary table is small it can be created into the memory, otherwise, it’s created on the disk. Needless to say that an in-memory temporary table is faster. MySQL creates an in-memory table, and if it becomes too large it is converted to an on-disk table. The maximum size for in-memory temporary tables is defined by the tmp_table_size or max_heap_table_size value, whichever is smaller. The default size in MySQL 5.7 is 16MB. If you run queries on a large amount of data, or if you have not optimized queries, you can increase the variables. When setting the threshold, take into consideration the available RAM and the number of concurrent connections you have during your peaks. You cannot indefinitely increase the variables, as at some point you’ll need to let MySQL use on-disk temporary tables.

Note: the temporary table is created on-disk if the tables involved have TEXT or BLOB columns, even if the size is less than the configured threshold.

Temporary tables storage engine

Until MySQL 5.6, all the on-disk temporary tables are created as MyISAM. The temporary table is created in-memory or on-disk, depending on the configuration, and it’s dropped immediately at the end of the query. From MySQL 5.7, they are created as InnoDB by default. Then you can rely on the advanced features.

The new default is the best option for the overall performance and should be used in the majority of the cases.

A new configuration variable is available to set the storage engine for the temporary tables:  internal_tmp_disk_storage_engine. The variable can be set to  innodb (default if not set) or myisam.

The potential problem with InnoDB temporary tables

Although using InnoDB is the best for performance, a new potential issue could arise. In some particular cases, you can have disk exhaustion and server outage.

As any other InnoDB table in the database, the temporary tables have their own tablespace file. The new file is in the data directory together with the general tablespace, with the name ibtmp1. It stores all the tmp tables. A tablespace file cannot be shrunk, and it grows constantly as long as you don’t run a manual OPTIMIZE TABLE. The ibtmp1 makes no difference, as you cannot use OPTIMIZE. The only way to shrink ibtmp1 size to zero is to restart the server.

Fortunately, even if the file cannot shrink, after the execution of a query the temporary table is automatically dropped and the space in the tablespace can be reused for another incoming query.

Let’s think now about the following case:

  • you have non-optimized queries that require the creation of very large on-disk tmp tables
  • you have optimized queries, but they are creating very large on-disk tmp tables because you are doing in purpose computation on a very large dataset (statistics, analytics)
  • you have a lot of concurrent connections running the same queries with tmp table creation
  • you don’t have a lot of free space in your volume

In such a situation it’s easy to understand that the file ibtmp1 size can increase considerably and the file can easily exhaust the free space. This was happening several times a day, and the server had to be restarted in order to completely shrink the ibtmp1 tablespace.

It’s not mandatory that the concurrent queries are launched exactly at the same time. Since a query with a large temporary table will take several seconds or minutes to execute, it is sufficient to have queries launched at different times while the preceding ones are still running. Also, you have to consider that any connection creates its own temporary table, so the same exact query will create another exact copy of the same temporary table into the tablespace. Exhausting the disk space is very easy with non-shrinkable files!

So, what to do to avoid disk exhaustion and outages?

The trivial solution: use a larger disk

This is really trivial and can solve the problem, but it is not the optimal solution. In fact, it’s not so easy to figure out what your new disk size should be. You can guess by increasing the disk size step by step, which is quite easy to do if your environment is in the cloud or you have virtual appliances on a very large platform. But it’s not easy to do in on-premise environments.

But with this solution, you can risk having unneeded expenses, so keep that in mind.

You can also move the ibtmp1 file on a dedicated large disk, by setting the following configuration variable:

innodb_temp_data_file_path = ../../tmp/ibtmp1:12M:autoextend

A MySQL restart is required.

Please note that the path has to be specified as relative to the data directory.

Set an upper limit for ibtmp1 size

For example:

innodb_temp_data_file_path = ibtmp1:12M:autoextend:max:10G

In this case, the file cannot grow more than 10GB. You can easily eliminate the outages, but this is a dangerous solution. When the data file reaches the maximum size, queries fail with an error indicating that the table is full. This is probably bad for your applications.

Step back to MyISAM for on-disk temporary tables

This solution seems to be counterintuitive but it could be the best way to avoid the outages in a matter of seconds and is guaranteed to use all needed temporary tables.

You can set the following variable into my.cnf:

internal_tmp_disk_storage_engine = MYISAM

Since the variable is dynamic you can set it also at runtime:

SET GLOBAL internal_tmp_disk_storage_engine = MYISAM;

Stepping back to MyISAM, you will considerably decrease the possibility of completely filling your disk space. In fact, the temporary tables will be created into different files and immediately dropped at the end of the query. No more issues about a forever increasing file.

And while there is always the possibility to see the same issue, just in case you can run the queries at the exact same time or really very close. In my real-world case, this was the solution to avoid all the outages.

Optimize your queries

This is the most important thing to do. After stepping back the storage engine to MyISAM to mitigate the outage occurrences, you have to absolutely take the time to analyze the queries.

The goal is to decrease the size of the on-disk temporary tables. It’s not the aim of this article to explain how to investigate the queries, but you can rely on the slow log, on a tool like pt-query-digest and on EXPLAIN.

Some tips:

  • create missing indexes on the tables
  • add more filters in the queries to gather less data, if you don’t really need it
  • rewrite queries to optimize the execution plan
  • if you have very large queries on purpose, you can use a queue manager in your applications to serialize their executions or to decrease the concurrency

This will be the longest activity, but hopefully, after all the optimizations, you can return to set the temporary storage engine to InnoDB for better performance.


Sometimes the improvements can have unexpected side effects. The InnoDB storage engine for on-disk temporary tables is a good improvement, but in some particular cases, for example, if you have non-optimized queries and little free space, you can have outages because of “disk full” error. Stepping back the tmp storage engine to MyISAM is the fastest way to avoid outages, but the optimization of the queries is the more important thing to do as soon as possible in order to return to InnoDB. And yes, even a larger or dedicated disk may help. It’s a trivial suggestion, I know, but it can definitely help a lot.

By the way, there’s a feature request about the issue:

Further readings:



Upcoming Webinar 5/16: Monitoring MongoDB with Percona Monitoring and Management (PMM)

Percona Monitoring and Management

Percona Monitoring and ManagementPlease join Percona’s Product Manager Michael Coburn as he presents his talk Monitoring MongoDB with Percona Monitoring and Management (PMM) on May 16th, 2019 at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

Register Now

Learn how to monitor MongoDB using Percona Monitoring and Management (PMM) that will allow you to:

  • Gain greater visibility of database performance and bottlenecks
  • Consolidate your MongoDB servers into the same monitoring platform you already use for MySQL and PostgreSQL
  • Respond more quickly and efficiently to Severity 1 issues

We will show you how to use PMM’s native support to have MongoDB integrated in just a few minutes!

In order to learn more, register for our webinar.


Watch Webinar: Speaking the same language as Developers and DBAs

Speaking the same language as Developers and DBAs

Speaking the same language as Developers and DBAsPlease join, Percona’s Product Manager, Michael Coburn as he presents his talk Speaking the same language as Developers and DBAs.

Watch the Recorded Webinar

In this talk, we’ll go over some of the high-level monitoring concepts that PMM enables for you, the CEO, to leverage in order to speak the same language as your technical team. For instance, we’ll cover:

– Which thresholds to follow for CEOs
– Host-specific resources – CPU, Memory, Disk, Network
– Database-specific resources – Connections, Queries per Second, Slow Queries, Replication lag
– Building custom KPI metrics using PMM
– Visualize Application logic
– Combine Application and Databases under one view

In order to learn more and speak the same language as Developers and DBAs, watch our webinar.


Watch Webinar: Config Database (Shardings) Deep Dive

Deep Dive In the Config Database (Shardings)

Deep Dive In the Config Database (Shardings)Please join Percona’s Senior Support Engineer Vinodh Krishnaswamy as he presents Deep Dive In the Config Database (Shardings).

Watch the Recorded Webinar

In MongoDB, we know the Sharded Cluster can migrate chunks across the Shards and also route a query. But where is this information stored, and how is it used by MongoDB to maintain consistency across the Sharded Cluster?

In this webinar, we will look at the config database in detail as well as the related metadata.

Watch our webinar to join us in taking a deep dive into the MongoDB config database.


Watch Webinar: Upgrading / Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions

Upgrading / Migrating your legacy PostgreSQL to newer PostgreSQL versions

Upgrading / Migrating your legacy PostgreSQL to newer PostgreSQL versionsPlease join Percona’s PostgreSQL Support Technical Lead, Avinash Vallarapu and Senior Support Engineers, Fernando Laudares, Jobin Augustine and Nickolay Ihalainen as they demonstrate the methods to upgrade a legacy version of PostgreSQL to a newer version using built-in as well as open source tools.

Watch the Recorded Webinar

To start, this webinar opens with a list of solutions that are built-in to PostgreSQL to help us upgrade a legacy version of PostgreSQL with minimal downtime. Next, the advantages of choosing such methods will be discussed. You will then notice a list of prerequisites for each solution, reducing the scope of possible mistakes. It’s important to minimize downtime when upgrading from an older version of PostgreSQL server. Therefore, we will present 3 open source solutions that will help us either minimize or completely avoid downtime.

Additionally, we will be publishing a series of 5 blog posts that will help us understand the solutions available to perform a PostgreSQL upgrade. Our presentation will show the full process of upgrading a set of PostgreSQL servers to the latest available version. Furthermore, we’ll show the pros and cons of each of the methods we employed.

Topics covered in this webinar include the following:

1. PostgreSQL upgrade using pg_dump/pg_restore (with downtime)
2. PostgreSQL upgrade using pg_dumpall (with downtime)
3. Continuous replication from a legacy PostgreSQL version to a newer version using Slony.
4. Replication between major PostgreSQL versions using Logical Replication
5. Fast upgrade of legacy PostgreSQL with minimum downtime.

We will walk you through and demonstrate methods that you may find useful in your database environment. We will witness how simple and quick it is to perform the upgrade using these methods.


Watch Webinar: Billion Goods in Few Categories – How Histograms Save a Life?

Webinar: Billion Goods in Few Categories: How Histograms Save a Life?

Webinar: Billion Goods in Few Categories: How Histograms Save a Life?
Please join Percona’s Principal Support Engineer Sveta Smirnova as she presents Billion Goods in Few Categories: How Histograms Save a Life?

Watch the Recorded Webinar

We store data with the intention to use it: search, retrieve, group, sort, etc. To perform these actions effectively, MySQL storage engines index data and communicate statistics with the Optimizer when it compiles a query execution plan. This approach works perfectly well unless your data distribution is uneven.

Last year I worked on several tickets where data followed the same pattern: millions of popular products fit into a couple of categories and the rest used all the others. We had a hard time finding a solution for retrieving goods fast. Workarounds for version 5.7 were offered. However, we learned a new MySQL 8.0 feature – histograms – would work better, cleaner, and faster. Thus, the idea of our talk was born.

In this webinar, we will discuss:
– How index statistics are physically stored
– Which data is exchanged with the Optimizer
– Why it is not enough to make a correct index choice

In the end, I will explain which issues are resolved by histograms, and we will discuss why using index statistics are insufficient for the fast retrieval of unevenly distributed data.


Creating Custom Sysbench Scripts

sysbench-lua for benchmark tooling

sysbench-lua for benchmark toolingSysbench has long been established as the de facto standard when it comes to benchmarking MySQL performance. Percona relies on it daily, and even Oracle uses it when blogging about new features in MySQL 8. Sysbench comes with several pre-defined benchmarking tests. These tests are written in an easy-to-understand scripting language called Lua. Some of these tests are called: oltp_read_write, oltp_point_select, tpcc, oltp_insert. There are over ten such scripts to emulate various behaviors found in standard OLTP applications.

But what if your application does not fit the pattern of traditional OLTP? How can you continue to utilize the power of load-testing, benchmarking, and results analysis with sysbench? Just write your own Lua script!

For those that want to jump ahead and see the full source, here you go.

Sysbench API

To start off, each Lua script you create must implement three core sysbench-Lua API functions. These are thread_init, thread_done, and event. You can read the comments in the code below for the meaning of each function and what is happening inside.

-- Called by sysbench one time to initialize this script
function thread_init()
  -- Create globals to be used elsewhere in the script
  -- drv - initialize the sysbench mysql driver
  drv = sysbench.sql.driver()
  -- con - represents the connection to MySQL
  con = drv:connect()
-- Called by sysbench when script is done executing
function thread_done()
  -- Disconnect/close connection to MySQL
-- Called by sysbench for each execution
function event()
  -- If user requested to disable transactions,
  -- do not execute BEGIN statement
  if not sysbench.opt.skip_trx then
  -- Run our custom statements
  -- Like above, if transactions are disabled,
  -- do not execute COMMIT
  if not sysbench.opt.skip_trx then

That’s all pretty simple and should function as a good template in your scripts. Now let’s take a look at the rest of the script.

Sanity checks and options

Now let’s get into the core code. At the top you’ll find the following sections:

if sysbench.cmdline.command == nil then
   error("Command is required. Supported commands: run")
sysbench.cmdline.options = {
  point_selects = {"Number of point SELECT queries to run", 5},
  skip_trx = {"Do not use BEGIN/COMMIT; Use global auto_commit value", false}

The first section is a sanity check to make sure the user actually wants to run this test. Other test scripts, mentioned above, support commands like prepare, run, and cleanup. Our script only supports run as the data we are using is pre-populated by our core application.

The second section allows us, the script writer, to let the user pass some options specific to our test script. In the code above, we can see an option for the number of SELECT statements that will be ran on each thread/iteration (default is 5) and the other option allows the user to disable BEGIN/COMMIT if they so desire (default is false). If you want more customization in your script, simply add more options. You’ll see how to reference these parameters later on.

The queries

Now it is time to define the custom queries we want to execute in our script.

-- Array of categories to be use in the INSERTs
local page_types = { "actor", "character", "movie" }
-- Array of COUNT(*) queries
local select_counts = {
  "SELECT COUNT(*) FROM imdb.title"
-- Array of SELECT statements that have 1 integer parameter
local select_points = {
  "SELECT * FROM imdb.title WHERE id = %d",
  "SELECT * FROM imdb.comments ORDER BY id DESC limit 10",
  "SELECT AVG(rating) avg FROM imdb.movie_ratings WHERE movie_id = %d",
  "SELECT * FROM imdb.users ORDER BY RAND() LIMIT 1"
-- Array of SELECT statements that have 1 string parameter
local select_string = {
  "SELECT * FROM imdb.title WHERE title LIKE '%s%%'"
-- INSERT statements
local inserts = {
  "INSERT INTO imdb.users (email_address, first_name, last_name) VALUES ('%s', '%s', '%s')",
  "INSERT INTO imdb.page_views (type, viewed_id, user_id) VALUES ('%s', %d, %d)"

The above code defines several arrays/lists of different queries. Why is this necessary? Later on in the code, we will have to parse each SQL statement and populate/replace the various parameters with randomly generated values. It would not do us any good to repeat the same SELECT * FROM fooTable WHERE id = 44 every time, now would it? Certainly not. We want to generate random numbers and have our queries select from the entire dataset.

Some queries have no parameters, some have integer-based, and some string-based. We will handle these differently below, which is why they are in different arrays above. This method also allows for future expansion. When you want to run additional queries within the script, just add another line to each array; no need to change any other code.

Parse and execute

The function below, execute_selects, will be called from the parent function, event, which we discussed earlier in the post. You can see for-loops for each of the three SELECT categories we created above. The comments inline should help explain what is happening. Note the use of the user-provided option –point-selects in the second loop below, which we created previously in the ‘Sanity and Options’ section.

function execute_selects()
  -- Execute each simple, no parameters, SELECT
  for i, o in ipairs(select_counts) do
  -- Loop for however many queries the
  -- user wants to execute in this category
  for i = 1, sysbench.opt.point_selects do
    -- select random query from list
    local randQuery = select_points[math.random(#select_points)]
    -- generate random id and execute
    local id = sysbench.rand.pareto(1, 3000000)
    con:query(string.format(randQuery, id))
  -- generate random string and execute
  for i, o in ipairs(select_string) do
    local str = sysbench.rand.string(string.rep("@", sysbench.rand.special(2, 15)))
    con:query(string.format(o, str))

Two more things to mention for this code. First, you will notice the use of sysbench.rand.pareto to generate a random number between 1 and 3,000,000. For our dataset, we know that each table referenced in all queries relating to WHERE id = ? has that many number of rows, at minimum. This is specific to our data. Your values will certainly be different. Second, notice the use of sysbench.rand.string, and string.rep. The string.rep segment will generate a string comprised of ‘@’ symbols, between 2 and 15 characters long. That string of ‘@’ symbols will then be passed to sysbench.rand.string, which will swap out each ‘@’ for a random alphanumeric value. For example, ‘@@@@@@’ could be changed to ‘Hk9EdC’ which will then replace the ‘%s’ inside the query string (string.format) and be executed.

Handle inserts

Our INSERT statements require values. Again, sysbench calls the function execute_inserts from event on each iteration. Inside execute_inserts, we generate some fake string data using built-in functions described above.

Those strings are then formatted into the SQL and executed.

function create_random_email()
  local username = sysbench.rand.string(string.rep("@",sysbench.rand.uniform(5,10)))
  local domain = sysbench.rand.string(string.rep("@",sysbench.rand.uniform(5,10)))
  return username .. "@" .. domain .. ".com"
function execute_inserts()
  -- generate fake email/info
  local email = create_random_email()
  local firstname = sysbench.rand.string("first-" .. string.rep("@", sysbench.rand.special(2, 15)))
  local lastname = sysbench.rand.string("last-" .. string.rep("@", sysbench.rand.special(2, 15)))
  -- INSERT for new imdb.user
  con:query(string.format(inserts[1], email, firstname, lastname))
  -- INSERT for imdb.page_view
  local page = page_types[math.random(#page_types)]
  con:query(string.format(inserts[2], page, sysbench.rand.special(2, 500000), sysbench.rand.special(2, 500000)))

Example run

$ sysbench imdb_workload.lua \
    --mysql-user=imdb --mysql-password=imdb \
    --mysql-db=imdb --report-interval=1 \
    --events=0 --time=0 run
WARNING: Both event and time limits are disabled, running an endless test
sysbench 1.0.17 (using system LuaJIT 2.0.4)
Running the test with following options:
Number of threads: 1
Report intermediate results every 1 second(s)
Initializing random number generator from current time
Initializing worker threads...
Threads started!
[ 1s ] thds: 1 tps: 15.96 qps: 177.54 (r/w/o: 112.71/31.92/32.91) lat (ms,95%): 158.63 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 1 tps: 15.01 qps: 169.09 (r/w/o: 109.06/30.02/30.02) lat (ms,95%): 137.35 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 1 tps: 26.00 qps: 285.00 (r/w/o: 181.00/52.00/52.00) lat (ms,95%): 108.68 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 1 tps: 15.00 qps: 170.00 (r/w/o: 108.00/32.00/30.00) lat (ms,95%): 164.45 err/s: 0.00 reconn/s: 0.00

And there we have it! Custom queries specific to our application and dataset. Most of the sysbench parameters are self-explanatory, but let me talk about –report-interval=1 which shows statistics every 1 second. Normally sysbench does not output stats until the end of the run, however, the example execution will run forever (–events=0 –time=0) so we need stats to show all the time. You can adjust the parameters to your liking. For instance, if you only want to run a test for 5 minutes, set –events=0 –run-time=300.


Sysbench is a very well designed application that allows you to load-test your MySQL instances using pre-defined and custom queries. Using the Lua scripting language, you can create just about any scenario to fit your needs. The above is just one example that we use within Percona’s Training and Education department. It is by no means an exhaustive example of all of the capabilities of sysbench-Lua.

Photo by Lachlan Donald on Unsplash


MySQL-python: Adding caching_sha2_password and TLSv1.2 Support

python not connecting to MySQL

python not connecting to MySQLPython 2 reaches EOL on 2020-01-01 and one of its commonly used third-party packages is MySQL-python. If you have not yet migrated away from both of these, since MySQL-python does not support Python 3, then you may have come across some issues if you are using more recent versions of MySQL and are enforcing a secured installation. This post will look at two specific issues that you may come across (caching_sha2_password in MySQL 8.0 and TLSv1.2 with MySQL >=5.7.10 when using OpenSSL) that will prevent you from connecting to a MySQL database and buy you a little extra time to migrate code.

For CentOS 7, MySQL-python is built against the client library provided by the mysql-devel package, which does not support some of the newer features, such as caching_sha2_password (the new default authentication plugin as of MySQL 8.0.4) and TLSv1.2. This means that you cannot take advantage of these security features and therefore leave yourself with an increased level of vulnerability.

So, what can we do about this? Thankfully, it is very simple to resolve, so long as you don’t mind getting your hands dirty!

Help! MySQL-python won’t connect to my MySQL 8.0 instance

First of all, let’s take a look at the issues that the latest version of MySQL-python (MySQL-python-1.2.5-1.el7.rpm for CentOS 7) has when connecting to a MySQL 8.0 instance. We will use a Docker container to help test this out by installing MySQL-python along with the Percona Server 8.0 client so that we can compare the two.

=> docker run --rm -it --network host --name rpm-build centos:latest
# yum install -y -q MySQL-python
warning: /var/cache/yum/x86_64/7/base/packages/MySQL-python-1.2.5-1.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID f4a80eb5: NOKEY
Public key for MySQL-python-1.2.5-1.el7.x86_64.rpm is not installed
Importing GPG key 0xF4A80EB5:
 Userid     : "CentOS-7 Key (CentOS 7 Official Signing Key) <>"
 Fingerprint: 6341 ab27 53d7 8a78 a7c2 7bb1 24c6 a8a7 f4a8 0eb5
 Package    : centos-release-7-6.1810.2.el7.centos.x86_64 (@CentOS)
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
# yum install -q -y; percona-release setup ps80
* Enabling the Percona Original repository
<*> All done!
The percona-release package now contains a percona-release script that can enable additional repositories for our newer products.
For example, to enable the Percona Server 8.0 repository use:
  percona-release setup ps80
Note: To avoid conflicts with older product versions, the percona-release setup command may disable our original repository for some products.
For more information, please visit:
* Disabling all Percona Repositories
* Enabling the Percona Server 8.0 repository
* Enabling the Percona Tools repository
<*> All done!
# yum install -q -y percona-server-client
warning: /var/cache/yum/x86_64/7/ps-80-release-x86_64/packages/percona-server-shared-8.0.15-5.1.el7.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 8507efa5: NOKEY
Public key for percona-server-shared-8.0.15-5.1.el7.x86_64.rpm is not installed
Importing GPG key 0x8507EFA5:
 Userid     : "Percona MySQL Development Team (Packaging key) <>"
 Fingerprint: 4d1b b29d 63d9 8e42 2b21 13b1 9334 a25f 8507 efa5
 Package    : percona-release-1.0-11.noarch (installed)
 From       : /etc/pki/rpm-gpg/PERCONA-PACKAGING-KEY

Next we will create a client config to save some typing later on and then check that we can connect to the MySQL instance that is already running; if you don’t have MySQL running on your local machine then you can install it in the container.

# cat <<EOS > /root/.my.cnf
> [client]
> user = testme
> password = my-secret-pw
> host =
> protocol = tcp
> ssl-mode = REQUIRED
mysql> /* hide passwords */ PAGER sed "s/AS '.*' REQUIRE/AS 'xxx' REQUIRE/" ;
PAGER set to 'sed "s/AS '.*' REQUIRE/AS 'xxx' REQUIRE/"'
mysql> SELECT @@version, @@version_comment, ssl_version, ssl_cipher, user FROM sys.session_ssl_status INNER JOIN sys.processlist ON thread_id = thd_id AND conn_id = CONNECTION_ID();
8.0.12-1        Percona Server (GPL), Release '1', Revision 'b072566'   TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256     testme@localhost
All looks good here, so now we will check that MySQLdb can also connect:
# /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()"
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 81, in Connect
    return Connection(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 193, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.OperationalError: (2059, "Authentication plugin 'caching_sha2_password' cannot be loaded: /usr/lib64/mysql/plugin/ cannot open shared object file: No such file or directory")

Changing the user’s authentication plugin

We have hit the first issue, because MySQL 8.0 introduced the caching_sha2_password plugin and made it the default authentication plugin, so we can’t connect at all. However, we can gain access by changing the grants for the user to use the old default plugin and then test again.

mysql> ALTER USER 'testme'@'%' IDENTIFIED WITH mysql_native_password BY "my-secret-pw";
# /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()"
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 81, in Connect
    return Connection(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 193, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.OperationalError: (1045, "Access denied for user 'testme'@'localhost' (using password: YES)")

Configuring SSL options

We still can’t connect, so what can be wrong? Well, we haven’t added any SSL details to the config other than specifying that we need to use SSL, so we will add the necessary options and make sure that we can connect.

# cat <<EOS >> /root/.my.cnf
> ssl-ca = /root/certs/ca.pem
> ssl-cert = /root/certs/client-cert.pem
> ssl-key = /root/certs/client-key.pem
# mysql -Bse "select 1"
# /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()"
(('8.0.12-1', "Percona Server (GPL), Release '1', Revision 'b072566'", 'TLSv1', 'ECDHE-RSA-AES256-SHA', 'testme@localhost'),)

Forcing TLSv1.2 or later to further secure connections

That looks much better now, but if you look closely then you will notice that the MySQLdb connection is using TLSv1, which will make your security team either sad, angry or perhaps both as the connection can be downgraded! We want to secure the installation and keep data over the wire safe from prying eyes, so we will remove TLSv1 and also TLSv1.1 from the list of versions accepted by MySQL and leave only TLSv1.2 (TLSv1.3 is not supported with our current MySQL 8.0 and OpenSSL versions). Any guesses what will happen now?

mysql> SET PERSIST_ONLY tls_version = "TLSv1.2"; RESTART;
# /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()"
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 81, in Connect
    return Connection(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/MySQLdb/", line 193, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.OperationalError: (2026, 'SSL connection error: error:00000001:lib(0):func(0):reason(1)')

MySQLdb can no longer connect to MySQL, sadly with a rather cryptic message! However, as we made the change that triggered this we don’t need to decipher it and can now start looking to add TLSv1.2 support to MySQLdb, so roll up your sleeves!

Solution: Build a new RPM

In order to build a new RPM we will need to do a little extra work in the container first of all, but it doesn’t take long and is pretty simple to do. We are going to install the necessary packages to create a basic build environment and then rebuild the MySQL-python RPM from its current source RPM.

## Download packages
# yum install -q -y rpm-build yum-utils gnupg2 rsync deltarpm gcc
Package yum-utils-1.1.31-50.el7.noarch already installed and latest version
Package gnupg2-2.0.22-5.el7_5.x86_64 already installed and latest version
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
## Create build directory tree
# install -d /usr/local/builds/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
## Configure the RPM macro
# echo "%_topdir /usr/local/builds/rpmbuild" > ~/.rpmmacros
## Switch to a temporary directory to ease cleanup
# cd "$(mktemp -d)"; pwd
## Download the source RPM
# yumdownloader --source -q MySQL-python
Enabling updates-source repository
Enabling base-source repository
Enabling extras-source repository
## Extract the source RPM
# rpm2cpio "$(ls -1 MySQL-python*src.rpm)" | cpio -idmv
234 blocks

We are now ready to start making some changes to the source code and build specifications, but first of all we need to take note of another change that occurred in MySQL 8.0. Older code will reference my_config.h, which has since been removed and is no longer required for building; the fix for this will be shown below.

## Adjust the spec file to use percona-server-devel and allow a build number
# sed -i "s/mysql-devel/percona-server-devel/g; s/Release: 1/Release: %{buildno}/" MySQL-python.spec
## Store the ZIP filename and extract
# SRC_ZIP="$(ls -1 MySQL-python*.zip)"; unzip "${SRC_ZIP}"
## Store the source directory and remove the include of my_config.h
# SRC_DIR=$(find . -maxdepth 1 -type d -name "MySQL-python*"); sed -i 's/#include "my_config.h"/#define NO_MY_CONFIG/' "${SRC_DIR}/_mysql.c"
## View our _mysql.c change
# fgrep -m1 -B3 -A1 -n NO_MY_CONFIG "${SRC_DIR}/_mysql.c"
41-#if defined(MS_WINDOWS)
44:#define NO_MY_CONFIG
## Update the source
# zip -uv "${SRC_ZIP}" "${SRC_DIR}/_mysql.c"
updating: MySQL-python-1.2.5/_mysql.c   (in=84707) (out=17314) (deflated 80%)
total bytes=330794, compressed=99180 -> 70% savings

Now that we have adjusted the source code and specification we can start work on the new package so that we can once again connect to MySQL.

## Sync the source to the build tree
# rsync -ac ./ /usr/local/builds/rpmbuild/SOURCES/
## Copy the new specification file to the build tree
# cp -a MySQL-python.spec /usr/local/builds/rpmbuild/SPECS/
## Build a new source RPM with our updates
# rpmbuild --define "buildno 2" -bs /usr/local/builds/rpmbuild/SPECS/MySQL-python.spec
Wrote: /usr/local/builds/rpmbuild/SRPMS/MySQL-python-1.2.5-2.el7.src.rpm
## Use the new source RPM to install any missing dependencies
# yum-builddep -q -y /usr/local/builds/rpmbuild/SRPMS/MySQL-python-1.2.5-2.el7.src.rpm &>>debug.log
## Build a new RPM
# rpmbuild --define "buildno 2" --rebuild /usr/local/builds/rpmbuild/SRPMS/MySQL-python-1.2.5-2.el7.src.rpm &>>debug.log
# tail -n1 debug.log
+ exit 0

All went well and so we can now install the new package and see if it worked.

## Install the local RPM
# yum localinstall -q -y /usr/local/builds/rpmbuild/RPMS/x86_64/MySQL-python-1.2.5-2.el7.x86_64.rpm
## Check to see which libmysqlclient is used
# ldd /usr/lib64/python2.7/site-packages/ | fgrep libmysqlclient => /usr/lib64/mysql/ (0x00007f61660be000)
## Test the connection
# /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()"
(('8.0.12-1', "Percona Server (GPL), Release '1', Revision 'b072566'", 'TLSv1.2', 'ECDHE-RSA-AES128-GCM-SHA256', 'testme@localhost'),)

Almost there… now force user authentication with the caching_sha2_password plugin

Hurrah! We can once again connect to the database and this time we are using TLSv1.2 and have thus increased our security. There is one thing left to do now though. Earlier on we needed to change the authentication plugin, so we will now change it back for extra security and see if all is still well.

mysql> ALTER USER 'testme'@'%' IDENTIFIED WITH caching_sha2_password BY "my-secret-pw";
# /usr/bin/python -c "import MySQLdb; dbc=MySQLdb.connect(read_default_file='~/.my.cnf').cursor(); dbc.execute('select @@version, @@version_comment, ssl_version, ssl_cipher, user from sys.session_ssl_status inner join sys.processlist on thread_id = thd_id and conn_id = connection_id()'); print dbc.fetchall()"
(('8.0.12-1', "Percona Server (GPL), Release '1', Revision 'b072566'", 'TLSv1.2', 'ECDHE-RSA-AES128-GCM-SHA256', 'testme@localhost'),)

Mission successful! Hopefully, if you are finding yourself in need of a little extra time migrating away from MySQLdb then this will help.

Image modified from photo by David Clode on Unsplash

Powered by WordPress | Theme: Aeros 2.0 by