Oct
01
2017
--

One Million Tables in MySQL 8.0

MySQL 8.0

In my previous blog post, I talked about new general tablespaces in MySQL 8.0. Recently MySQL 8.0.3-rc was released, which includes a new data dictionary. My goal is to create one million tables in MySQL and test the performance.

Background questions

Q: Why million tables in MySQL? Is it even realistic? How does this happen?

Usually, millions of tables in MySQL is a result of “a schema per customer” Software as a Service (SaaS) approach. For the purposes of customer data isolation (security) and logical data partitioning (performance), each “customer” has a dedicated schema. You can think of a WordPress hosting service (or any CMS based hosting) where each customer has their own dedicated schema. With 10K customers per MySQL server, we could end up with millions of tables.

Q: Should you design an application with >1 million tables?

Having separate tables is one of the easiest designs for a multi-tenant or SaaS application, and makes it easy to shard and re-distribute your workload between servers. In fact, the table-per-customer or schema-per-customer design has the quickest time-to-market, which is why we see it a lot in consulting. In this post, we are not aiming to cover the merits of should you do this (if your application has high churn or millions of free users, for example, it might not be a good idea). Instead, we will focus on if the new data dictionary provides relief to a historical pain point.

Q: Why is one million tables a problem?

The main issue results from the fact that MySQL needs to open (and eventually close) the table structure file (FRM file). With one million tables, we are talking about at least one million files. Originally MySQL fixed it with table_open_cache and table_definition_cache. However, the maximum value for table_open_cache is 524288. In addition, it is split into 16 partitions by default (to reduce the contention). So it is not ideal. MySQL 8.0 has removed FRM files for InnoDB, and will now allow you to create general tablespaces. I’ve demonstrated how we can create tablespace per customer in MySQL 8.0, which is ideal for “schema-per-customer” approach (we can move/migrate one customer data to a new server by importing/exporting the tablespace).

One million tables in MySQL 5.7

Recently, I’ve created the test with one million tables. The test creates 10K databases, and each database contains 100 tables. To use a standard benchmark I’ve employed sysbench table structure.

mysql> select count(*) from information_schema.schemata where schema_name like 'test_sbtest%';
+----------+
| count(*) |
+----------+
| 10000    |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from information_schema.tables where table_schema like 'test_sbtest%';
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (4.61 sec)

This also creates a huge overhead: with one million tables we have ~two million files. Each .frm file and .ibd file size sums up to 175G:

# du -sh /ssd/mysql_57
175G    /ssd/mysql_57

Now I’ve used sysbench Lua script to insert one row randomly into one table

pathtest = "/usr/share/sysbench/tests/include/oltp_legacy/"
if pathtest then
   dofile(pathtest .. "common.lua")
else
   require("common")
end
function event()
   local table_name
   local i
   local c_val
   local k_val
   local pad_val
   local oltp_tables_count = 100
   local oltp_db_count = 10000
   table_name = "test_sbtest_" .. sb_rand_uniform(1, oltp_db_count) .. ".sbtest".. sb_rand_uniform(1, oltp_tables_count)
   k_val = sb_rand(1, oltp_table_size)
   c_val = sb_rand_str([[
###########-###########-###########-###########-###########-###########-###########-###########-###########-###########]])
   pad_val = sb_rand_str([[
###########-###########-###########-###########-###########]])
   rs = db_query("INSERT INTO " .. table_name ..
                       " (id, k, c, pad) VALUES " ..
                       string.format("(%d, %d, '%s', '%s')", i, k_val, c_val,
                                     pad_val))
   end
end

With:

local oltp_tables_count = 100
   local oltp_db_count = 10000

Sysbench will choose one table randomly out of one million. With oltp_tables_count = 1 and oltp_db_count = 100, it will only choose the first table (sbtest1) out of the first 100 databases (randomly).

As expected, MySQL 5.7 has a huge performance degradation when going across one million tables. When running a script that only inserts data into 100 random tables, we can see ~150K transactions per second. When the data is inserted in one million tables (chosen randomly) performance drops to 2K (!) transactions per second:

Insert into 100 random tables:

SQL statistics:
    queries performed:
        read:                            0
        write:                           16879188
        other:                           0
        total:                           16879188
    transactions:                        16879188 (140611.72 per sec.)
    queries:                             16879188 (140611.72 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

Insert into one million random tables:

SQL statistics:
    queries performed:
        read:                            0
        write:                           243533
        other:                           0
        total:                           243533
    transactions:                        243533 (2029.21 per sec.)
    queries:                             243533 (2029.21 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

This is expected. Here I’m testing the worse case scenario, where we can’t keep all table open handlers and table definitions in cache (memory) since the table_open_cache and table_definition_cache both have a limit of 524288.

Also, normally we can expect a huge skew between access to the tables. There can be only 20% active customers (80-20 rule), meaning that we can only expect an active access to 2K databases. In addition, there will be old or unused tables so we can expect around 100K or less of active tables.

Hardware and config files

The above results are from this server:

Processors   | 64xGenuine Intel(R) CPU @ 2.00GHz
Memory Total | 251.8G
Disk         | Samsung 950 Pro PCIE SSD (nvme)

Sysbench script:

sysbench $conn --report-interval=1 --num-threads=32 --max-requests=0 --max-time=600 --test=/root/drupal_demo/insert_custom.lua run

My.cnf:

innodb_buffer_pool_size = 100G
innodb_io_capacity=20000
innodb_flush_log_at_trx_commit = 0
innodb_log_file_size = 2G
innodb_flush_method=O_DIRECT_NO_FSYNC
skip-log-bin
open_files_limit=1000000
table_open_cache=524288
table_definition_cache=524288

One million tables in MySQL 8.0 + general tablespaces

In MySQL 8.0 is it easy and logical to create one general tablespace per each schema (it will host all tables in this schema). In MySQL 5.7, general tablespaces are available – but there are still .frm files.

I’ve used the following script to create 100 tables in one schema all in one tablespace:

mysql test -e "CREATE TABLESPACE t ADD DATAFILE 't.ibd' engine=InnoDB;"
for i in {1..10000}
do
           mysql test -e "create table ab$i(i int) tablespace t"
done

The new MySQL 8.0.3-rc also uses the new data dictionary, so all MyISAM tables in the mysql schema are removed and all metadata is stored in additional mysql.ibd file.

Creating one million tables

Creating InnoDB tables fast enough can be a task by itself. Stewart Smith published a blog post a while ago where he focused on optimizing time to create 30K tables in MySQL.

The problem is that after creating an .ibd file, MySQL needs to “fsync” it. However, when creating a table inside the tablespace, there is no fsync. I’ve created a simple script to create tables in parallel, one thread per database:

#/bin/bash
function do_db {
        mysql -vvv -e "create database $db";
        mysql -vvv $db -e "CREATE TABLESPACE $db ADD DATAFILE '$db.ibd' engine=InnoDB;"
        for i in {1..100}
        do
                table="CREATE TABLE sbtest$i ( id int(10) unsigned NOT NULL AUTO_INCREMENT, k int(10) unsigned NOT NULL DEFAULT '0', c varchar(120) NOT NULL DEFAULT '', pad varchar(60) NOT NULL DEFAULT '', PRIMARY KEY (id), KEY k_1 (k) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 tablespace $db;"
                mysql $db -e "$table"
        done
}
c=0
for m in {1..100}
do
        for i in {1..100}
        do
                let c=$c+1
                echo $c
                db="test_sbtest_$c"
                do_db &
        done
        wait
done

That script works perfectly in MySQL 8.0.1-dmr and creates one million tables in 25 minutes and 28 seconds (1528 seconds). That is ~654 tables per second. That is significantly faster than ~30 tables per second in the original Stewart’s test and 2x faster than a test where all fsyncs were artificially disabled using libeat-my-data library.

Unfortunately, in MySQL 8.0.3-rc some regression was introduced. In MySQL 8.0.3-rc I can see heavy mutex contention, and the table creation speed dropped from 25 minutes to ~280 minutes. I’ve filed a bug report: performance regression: “create table” speed and scalability in 8.0.3.

Size on disk

With general tablespaces and no .frm files, the size on disk decreased:

# du -h -d1 /ssd/
147G    /ssd/mysql_801
119G    /ssd/mysql_803
175G    /ssd/mysql_57

Please note though that in MySQL 8.0.3-rc, with new native data dictionary, the size on disk increased as it needs to write additional information (Serialized Dictionary Information, SDI) to the tablespace files:

InnoDB: Serialized Dictionary Information (SDI) is now present in all InnoDB tablespace files
except for temporary tablespace and undo tablespace files.
SDI is serialized metadata for schema, table, and tablespace objects.
The presence of SDI data provides metadata redundancy.
...
The inclusion of SDI data in tablespace files increases tablespace file size.
An SDI record requires a single index page, which is 16k in size by default.
However, SDI data is compressed when it is stored to reduce the storage footprint.

The general mysql data dictionary in MySQL 8.0.3 is 6.6Gb:

6.6G /ssd/mysql/mysql.ibd

Benchmarking the insert speed in MySQL 8.0 

I’ve repeated the same test I’ve done for MySQL 5.7 in MySQL 8.0.3-rc (and in 8.0.1-dmr), but using general tablespace. I created 10K databases (=10K tablespace files), each database has100 tables and each database resides in its own tablespace.

There are two new tablespace level caches we can use in MySQL 8.0: tablespace_definition_cache and schema_definition_cache:

tablespace_definition_cache = 15000
schema_definition_cache = 524288

Unfortunately, with one million random table accesses in MySQL 8.0 (both 8.0.1 and 8.0.3), we can still see that it stalls on opening tables (even with no .frm files and general tablespaces):

mysql> select conn_id, current_statement, state, statement_latency, lock_latency from sys.processlist where current_statement is not null and conn_id <> CONNECTION_ID();
+---------+-------------------------------------------------------------------+----------------+-------------------+--------------+
| conn_id | current_statement                                                 | state          | statement_latency | lock_latency |
+---------+-------------------------------------------------------------------+----------------+-------------------+--------------+
|     199 | INSERT INTO test_sbtest_9749.s ... 8079-53209333270-93105555128') | Opening tables | 4.45 ms           | 0 ps         |
|     198 | INSERT INTO test_sbtest_1863.s ... 9574-29782886623-39251573705') | Opening tables | 9.95 ms           | 5.67 ms      |
|     189 | INSERT INTO test_sbtest_3948.s ... 9365-63502117132-66650064067') | Opening tables | 16.29 ms          | 15.38 ms     |
|     190 | INSERT INTO test_sbtest_6885.s ... 8436-41291265610-60894472357') | Opening tables | 13.78 ms          | 9.52 ms      |
|     191 | INSERT INTO test_sbtest_247.sb ... 7467-89459234028-92064334941') | Opening tables | 8.36 ms           | 3.18 ms      |
|     192 | INSERT INTO test_sbtest_9689.s ... 8058-74586985382-00185651578') | Opening tables | 6.89 ms           | 0 ps         |
|     193 | INSERT INTO test_sbtest_8777.s ... 1900-02582963670-01868315060') | Opening tables | 7.09 ms           | 5.70 ms      |
|     194 | INSERT INTO test_sbtest_9972.s ... 9057-89011320723-95018545652') | Opening tables | 9.44 ms           | 9.35 ms      |
|     195 | INSERT INTO test_sbtest_6977.s ... 7902-29158428721-66447528241') | Opening tables | 7.82 ms           | 789.00 us    |
|     196 | INSERT INTO test_sbtest_129.sb ... 2091-86346366083-87657045906') | Opening tables | 13.01 ms          | 7.30 ms      |
|     197 | INSERT INTO test_sbtest_1418.s ... 6581-90894769279-68213053531') | Opening tables | 16.35 ms          | 10.07 ms     |
|     208 | INSERT INTO test_sbtest_4757.s ... 4592-86183240946-83973365617') | Opening tables | 8.66 ms           | 2.84 ms      |
|     207 | INSERT INTO test_sbtest_2152.s ... 5459-55779113235-07063155183') | Opening tables | 11.08 ms          | 3.89 ms      |
|     212 | INSERT INTO test_sbtest_7623.s ... 0354-58204256630-57234862746') | Opening tables | 8.67 ms           | 2.80 ms      |
|     215 | INSERT INTO test_sbtest_5216.s ... 9161-37142478639-26288001648') | Opening tables | 9.72 ms           | 3.92 ms      |
|     210 | INSERT INTO test_sbtest_8007.s ... 2999-90116450579-85010442132') | Opening tables | 1.33 ms           | 0 ps         |
|     203 | INSERT INTO test_sbtest_7173.s ... 2718-12894934801-25331023143') | Opening tables | 358.09 us         | 0 ps         |
|     209 | INSERT INTO test_sbtest_1118.s ... 8361-98642762543-17027080501') | Opening tables | 3.32 ms           | 0 ps         |
|     219 | INSERT INTO test_sbtest_5039.s ... 1740-21004115002-49204432949') | Opening tables | 8.56 ms           | 8.44 ms      |
|     202 | INSERT INTO test_sbtest_8322.s ... 8686-46403563348-31237202393') | Opening tables | 1.19 ms           | 0 ps         |
|     205 | INSERT INTO test_sbtest_1563.s ... 6753-76124087654-01753008993') | Opening tables | 9.62 ms           | 2.76 ms      |
|     213 | INSERT INTO test_sbtest_5817.s ... 2771-82142650177-00423653942') | Opening tables | 17.21 ms          | 16.47 ms     |
|     216 | INSERT INTO test_sbtest_238.sb ... 5343-25703812276-82353892989') | Opening tables | 7.24 ms           | 7.20 ms      |
|     200 | INSERT INTO test_sbtest_2637.s ... 8022-62207583903-44136028229') | Opening tables | 7.52 ms           | 7.39 ms      |
|     204 | INSERT INTO test_sbtest_9289.s ... 2786-22417080232-11687891881') | Opening tables | 10.75 ms          | 9.01 ms      |
|     201 | INSERT INTO test_sbtest_6573.s ... 0106-91679428362-14852851066') | Opening tables | 8.43 ms           | 7.03 ms      |
|     217 | INSERT INTO test_sbtest_1071.s ... 9465-09453525844-02377557541') | Opening tables | 8.42 ms           | 7.49 ms      |
|     206 | INSERT INTO test_sbtest_9588.s ... 8804-20770286377-79085399594') | Opening tables | 8.02 ms           | 7.50 ms      |
|     211 | INSERT INTO test_sbtest_4657.s ... 4758-53442917995-98424096745') | Opening tables | 16.62 ms          | 9.76 ms      |
|     218 | INSERT INTO test_sbtest_9672.s ... 1537-13189199316-54071282928') | Opening tables | 10.01 ms          | 7.41 ms      |
|     214 | INSERT INTO test_sbtest_1391.s ... 9241-84702335152-38653248940') | Opening tables | 21.34 ms          | 15.54 ms     |
|     220 | INSERT INTO test_sbtest_6542.s ... 7778-65788940102-87075246009') | Opening tables | 2.96 ms           | 0 ps         |
+---------+-------------------------------------------------------------------+----------------+-------------------+--------------+
32 rows in set (0.11 sec)

And the transactions per second drops to ~2K.

Here I’ve expected different behavior. With the .frm files gone and with tablespace_definition_cache set to more than 10K (we have only 10K tablespace files), I’ve expected that MySQL does not have to open and close files. It looks like this is not the case.

I can also see the table opening (since the server started):

mysql> show global status like '%open%';
+----------------------------+-----------+
| Variable_name              | Value     |
+----------------------------+-----------+
| Com_ha_open                | 0         |
| Com_show_open_tables       | 0         |
| Innodb_num_open_files      | 10040     |
| Open_files                 | 0         |
| Open_streams               | 0         |
| Open_table_definitions     | 524288    |
| Open_tables                | 499794    |
| Opened_files               | 22        |
| Opened_table_definitions   | 1220904   |
| Opened_tables              | 2254648   |
| Slave_open_temp_tables     | 0         |
| Table_open_cache_hits      | 256866421 |
| Table_open_cache_misses    | 2254643   |
| Table_open_cache_overflows | 1254766   |
+----------------------------+-----------+

This is easier to see on the graphs from PMM. Insert per second for the two runs (both running 16 threads):

  1. The first run is 10K random databases/tablespaces and one table (sysbench is choosing table#1 from a randomly chosen list of 10K databases). This way there is also no contention on the tablespace file.
  2. The second run is a randomly chosen table from a list of one million tables.

As we can see, the first run is dong 50K -100K inserts/second. Second run is only limited to ~2.5 inserts per second:

MySQL 8.0

“Table open cache misses” grows significantly after the start of the second benchmark run:
MySQL 8.0

As we can see, MySQL performs ~1.1K table definition openings per second and has ~2K table cache misses due to the overflow:

MySQL 8.0

When inserting against only 1K random tables (one specific table in a random database, that way we almost guarantee that one thread will always write to a different tablespace file), the table_open_cache got warmed up quickly. After a couple of seconds, the sysbench test starts showing > 100K tps. The processlist looks much better (compare the statement latency and lock latency to the above as well):

mysql> select conn_id, current_statement, state, statement_latency, lock_latency from sys.processlist where current_statement is not null and conn_id <> CONNECTION_ID();
+---------+-------------------------------------------------------------------+----------------+-------------------+--------------+
| conn_id | current_statement                                                 | state          | statement_latency | lock_latency |
+---------+-------------------------------------------------------------------+----------------+-------------------+--------------+
|     253 | INSERT INTO test_sbtest_3293.s ... 2282-95400708146-84684851551') | starting       | 22.72 us          | 0 ps         |
|     254 | INSERT INTO test_sbtest_3802.s ... 4030-35983148190-23616685226') | update         | 62.88 us          | 45.00 us     |
|     255 | INSERT INTO test_sbtest_5290.s ... 2361-58374942527-86207214617') | Opening tables | 36.07 us          | 0 ps         |
|     256 | INSERT INTO test_sbtest_5684.s ... 4717-34992549120-04746631452') | Opening tables | 37.61 us          | 37.00 us     |
|     257 | INSERT INTO test_sbtest_5088.s ... 5637-75275906887-76112520982') | starting       | 22.97 us          | 0 ps         |
|     258 | INSERT INTO test_sbtest_1375.s ... 8592-24036624620-65536442287') | query end      | 98.66 us          | 35.00 us     |
|     259 | INSERT INTO test_sbtest_8764.s ... 8566-02569157908-49891861265') | Opening tables | 47.13 us          | 37.00 us     |
|     260 | INSERT INTO test_sbtest_560.sb ... 2605-08226572929-25889530906') | query end      | 155.64 us         | 38.00 us     |
|     261 | INSERT INTO test_sbtest_7776.s ... 0243-86335905542-37976752368') | System lock    | 46.68 us          | 32.00 us     |
|     262 | INSERT INTO test_sbtest_6551.s ... 5496-19983185638-75401382079') | update         | 74.07 us          | 40.00 us     |
|     263 | INSERT INTO test_sbtest_7765.s ... 5428-29707353898-77023627427') | update         | 71.35 us          | 45.00 us     |
|     265 | INSERT INTO test_sbtest_5771.s ... 7065-03531013976-67381721569') | query end      | 138.42 us         | 39.00 us     |
|     266 | INSERT INTO test_sbtest_8603.s ... 7158-66470411444-47085285977') | update         | 64.00 us          | 36.00 us     |
|     267 | INSERT INTO test_sbtest_3983.s ... 5039-55965227945-22430910215') | update         | 21.04 ms          | 39.00 us     |
|     268 | INSERT INTO test_sbtest_8186.s ... 5418-65389322831-81706268892') | query end      | 113.58 us         | 37.00 us     |
|     269 | INSERT INTO test_sbtest_1373.s ... 1399-08304962595-55155170406') | update         | 131.97 us         | 59.00 us     |
|     270 | INSERT INTO test_sbtest_7624.s ... 0589-64243675321-62971916496') | query end      | 120.47 us         | 38.00 us     |
|     271 | INSERT INTO test_sbtest_8201.s ... 6888-31692084119-80855845726') | query end      | 109.97 us         | 37.00 us     |
|     272 | INSERT INTO test_sbtest_7054.s ... 3674-32329064814-59707699237') | update         | 67.99 us          | 35.00 us     |
|     273 | INSERT INTO test_sbtest_3019.s ... 1740-35410584680-96109859552') | update         | 5.21 ms           | 33.00 us     |
|     275 | INSERT INTO test_sbtest_7657.s ... 4985-72017519764-59842283878') | update         | 88.91 us          | 48.00 us     |
|     274 | INSERT INTO test_sbtest_8606.s ... 0580-38496560423-65038119567') | freeing items  | NULL              | 37.00 us     |
|     276 | INSERT INTO test_sbtest_9349.s ... 0295-94997123247-88008705118') | starting       | 25.74 us          | 0 ps         |
|     277 | INSERT INTO test_sbtest_3552.s ... 2080-59650597118-53885660147') | starting       | 32.23 us          | 0 ps         |
|     278 | INSERT INTO test_sbtest_3832.s ... 1580-27778606266-19414961452') | freeing items  | 194.14 us         | 51.00 us     |
|     279 | INSERT INTO test_sbtest_7685.s ... 0234-22016898044-97277319766') | update         | 62.66 us          | 40.00 us     |
|     280 | INSERT INTO test_sbtest_6026.s ... 2629-36599580811-97852201188') | Opening tables | 49.41 us          | 37.00 us     |
|     281 | INSERT INTO test_sbtest_8273.s ... 7957-39977507737-37560332932') | update         | 92.56 us          | 36.00 us     |
|     283 | INSERT INTO test_sbtest_8584.s ... 7604-24831943860-69537745471') | starting       | 31.20 us          | 0 ps         |
|     284 | INSERT INTO test_sbtest_3787.s ... 1644-40368085836-11529677841') | update         | 100.41 us         | 40.00 us     |
+---------+-------------------------------------------------------------------+----------------+-------------------+--------------+
30 rows in set (0.10 sec)

What about the 100K random tables? That should fit into the table_open_cache. At the same time, the default 16 table_open_cache_instances split 500K table_open_cache, so each bucket is only ~30K. To fix that, I’ve set table_open_cache_instances = 4 and was able to get ~50K tps average. However, the contention inside the table_open_cache seems to stall the queries:

MySQL 8.0

There are only a very limited amount of table openings:

MySQL 8.0

 

Conclusion

MySQL 8.0 general tablespaces looks very promising. It is finally possible to create one million tables in MySQL without the need to create two million files. Actually, MySQL 8 can handle many tables very well as long as table cache misses are kept to a minimum.

At the same time, the problem with “Opening tables” (worst case scenario test) still persists in MySQL 8.0.3-rc and limits the throughput. I expected to see that MySQL does not have to open/close the table structure file. I also hope the create table regression bug is fixed in the next MySQL 8.0 version.

I’ve not tested other new features in the new data dictionary in 8.0.3-rc: i.e., atomic DDL (InnoDB now supports atomic DDL, which ensures that DDL operations are either committed in their entirety or rolled back in case of an unplanned server stoppage). That is the topic of the next blog post.

Jul
27
2017
--

What is MySQL Partitioning?

MySQL Partitioning

MySQL PartitioningIn this blog, we’ll quickly look at MySQL partitioning.

Partitioning is a way in which a database (MySQL in this case) splits its actual data down into separate tables, but still get treated as a single table by the SQL layer.

When partitioning, it’s a good idea to find a natural partition key. You want to ensure that table lookups go to the correct partition or group of partitions. This means that all SELECT, UPDATE, DELETE should include that column in the WHERE clause. Otherwise, the storage engine does a scatter-gather, and queries ALL partitions in a UNION that is not concurrent.

Generally, you must add the partition key into the primary key along with the auto increment, i.e., PRIMARY KEY (part_id,id). If you don’t have well-designed and small columns for this composite primary key, it could enlarge all of your secondary indexes.

You can partition by range or hash. Range is great because you have groups of known IDs in each table, and it helps when querying across partition IDs. This still can create hotspots in the newest partition, as all new inserts go there. Partitioning by hash “load balances” the table, and allows you to write to partitions more concurrently. This makes range queries on the partition key a bad idea.

In MySQL 5.7, partitioning became native to the store engine and deprecated the old method where MySQL itself had to handle the partitions. This means InnoDB partitions (and a larger amount of partitions) are a better choice than in the past.

As with all features and recommendations, this only makes sense if it helps your data and workload!

Jun
07
2017
--

ProxySQL Admin Interface Is Not Your Typical MySQL Server!

ProxySQL Admin

ProxySQL AdminIn this blog post, I’ll look at how ProxySQL Admin behaves in some unusual and unexpected ways from a MySQL perspective.

ProxySQL allows you to connect to its admin interface using the MySQL protocol and use familiar tools, like the MySQL command line client, to manage its configuration as a set of configuration tables. This ability may trick you into thinking that you’re working with a stripped-down MySQL server – and expect it to behave like MySQL. 

It would be a mistake to think this! In fact, ProxySQL embeds the SQLite database to store its configuration. As such, it behaves much closer to SQLite!

Below, I’ll show you a few things that confused me at first. All of these are as of ProxySQL 1.3.6 (in case behavior changes in the future).

Fake support for Use command

mysql> show databases;
+-----+---------+-------------------------------+
| seq | name    | file                          |
+-----+---------+-------------------------------+
| 0   | main    |                               |
| 2   | disk    | /var/lib/proxysql/proxysql.db |
| 3   | stats   |                               |
| 4   | monitor |                               |
+-----+---------+-------------------------------+
4 rows in set (0.00 sec)
mysql> select database();
+------------+
| DATABASE() |
+------------+
| admin      |
+------------+
1 row in set (0.00 sec)
mysql> use stats;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select database();
+------------+
| DATABASE() |
+------------+
| admin      |
+------------+
1 row in set (0.00 sec)
mysql> use funkydatabase;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed

So here we can see that:

  • There is a concept of multiple databases in the ProxySQL admin interface
  • The ProxySQL admin interface supports the 
    select database();

     function, which is always same value independent of the database you tried to set. Typically it will be “admin” or “stats”, depending on what user you use to connect to the database.

  • You can use the “use” command to change the database – but it does not really change the database. This is a required command, because if you don’t support it many MySQL clients will not connect.

Invisible tables

mysql> show tables;
+--------------------------------------+
| tables                               |
+--------------------------------------+
| global_variables                     |
| mysql_collations                     |
| mysql_query_rules                    |
| mysql_replication_hostgroups         |
| mysql_servers                        |
| mysql_users                          |
| runtime_global_variables             |
| runtime_mysql_query_rules            |
| runtime_mysql_replication_hostgroups |
| runtime_mysql_servers                |
| runtime_mysql_users                  |
| runtime_scheduler                    |
| scheduler                            |
+--------------------------------------+
13 rows in set (0.00 sec)
mysql> show tables from stats;
+--------------------------------+
| tables                         |
+--------------------------------+
| global_variables               |
| stats_mysql_commands_counters  |
| stats_mysql_connection_pool    |
| stats_mysql_global             |
| stats_mysql_processlist        |
| stats_mysql_query_digest       |
| stats_mysql_query_digest_reset |
| stats_mysql_query_rules        |
+--------------------------------+
8 rows in set (0.00 sec)
mysql> select count(*) from stats_mysql_commands_counters;
+----------+
| count(*) |
+----------+
| 52       |
+----------+
1 row in set (0.00 sec)

We can query a list of tables in our default database (which can’t change), and we also get lists of tables in the “stats” database with very familiar MySQL syntax. But we can also query the “stats” table directly without specifying the “stats” database, even if it is not shown in “show tables” for our current database.

Again this is SQLite behavior! ?

Strange Create Table syntax

mysql> show create table scheduler G
*************************** 1. row ***************************
      table: scheduler
Create Table: CREATE TABLE scheduler (
   id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
   active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1,
   interval_ms INTEGER CHECK (interval_ms>=100 AND interval_ms<=100000000) NOT NULL,
   filename VARCHAR NOT NULL,
   arg1 VARCHAR,
   arg2 VARCHAR,
   arg3 VARCHAR,
   arg4 VARCHAR,
   arg5 VARCHAR,
   comment VARCHAR NOT NULL DEFAULT '')
1 row in set (0.00 sec)

If we look into the ProxySQL Admin interface table structure, we see it is not quite MySQL. It uses CHECK constraints and doesn’t specify the length for VARCHAR. This is because it is SQLite table definition. 

SHOW command nuances

The ProxySQL Admin interface supports SHOW PROCESSLIST and even SHOW FULL PROCESSLIST commands, but not all the commands match the MySQL server output:

mysql> show processlist;
+-----------+---------------+--------+-----------+---------+---------+--------+
| SessionID | user          | db     | hostgroup | command | time_ms | info   |
+-----------+---------------+--------+-----------+---------+---------+--------+
| 129       | proxysql_user | sbtest | 10        | Query   | 14      | COMMIT |
| 130       | proxysql_user | sbtest | 10        | Query   | 16      | COMMIT |
| 131       | proxysql_user | sbtest | 10        | Query   | 9       | COMMIT |
| 133       | proxysql_user | sbtest | 10        | Query   | 0       | COMMIT |
| 134       | proxysql_user | sbtest | 10        | Query   | 5       | COMMIT |
….
| 191       | proxysql_user | sbtest | 10        | Query   | 4       | COMMIT |
| 192       | proxysql_user | sbtest | 10        | Query   | 1       | COMMIT |
+-----------+---------------+--------+-----------+---------+---------+--------+
62 rows in set (0.01 sec)

SHOW VARIABLES works, as does SHOW GLOBAL VARIABLES, but not SHOW SESSION VARIABLES.

SHOW STATUS doesn’t work as expected:

mysql> show status;
ERROR 1045 (#2800): near "show": syntax error

As you can see, while some typical MySQL commands and constructs work, others don’t. This is by design: ProxySQL implemented some of the commands to make it easy and familiar for MySQL users to navigate the ProxySQL interface. But don’t get fooled! It is not MySQL, and doesn’t always behave as you would expect.

You’ve been warned!

Dec
28
2016
--

Using Percona XtraBackup on a MySQL Instance with a Large Number of Tables

Percona XtraBackup

Percona XtraBackupIn this blog post, we’ll find out how to use Percona XtraBackup on a MySQL instance with a large number of tables.

As of Percona Xtrabackup 2.4.5, you are required to have enough open files to open every single InnoDB tablespace in the instance you’re trying to back up. So if you’re running innodb_file_per_table=1, and have a large number of tables, you’re very likely to see Percona XtraBackup fail with the following error message:

InnoDB: Operating system error number 24 in a file operation.
InnoDB: Error number 24 means 'Too many open files'
InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
InnoDB: File ./sbtest/sbtest132841.ibd: 'open' returned OS error 124. Cannot continue operation
InnoDB: Cannot continue operation.

If you run into this issue, here is what you need to do:

  1. Find out how many files you need:
root@ts140i:~# find /var/lib/mysql/ -name "*.ibd" | wc -l
1000005

I would add at least another 1000 to this number for system tablespace and other miscellaneous open file needs. You might want to go even higher to accommodate for a growing number of tables.

  1. Check the maximum number of files you can keep open in the system. If this number is too small Percona Xtrabackup might monopolize the open files in the system, causing other processes to fail when they try to open files. This can cause MySQL Server to crash, and other processes to fail.
root@ts140i:/mnt/data/backup# cat /proc/sys/fs/file-max 3262006

If you need to, here is how to  increase the number:

sysctl -w fs.file-max=5000000 echo "fs.file-max=5000000" >> /etc/sysctl.conf

  1. Increase the limit on the number of files the Percona XtraBackup process can open:

The best way to do this is using

--open-files-limit

 option. For example, you can specify the following in your my.cnf:

[xtrabackup]
open-files-limit=2000000

Alternatively, you can pass it as a command-line option, or run ulimit -n 2000000 before running the backup command.

You need to be sure your user account has permissions to set open files limit this high. If you are doing backups under the “root” user, it shouldn’t be a problem. Otherwise, you might need to adjust the limits in  /etc/security/limits.conf:

mysql hard nofile 2000000
mysql soft nofile 2000000

Specifying a “soft” limit in this file eliminates the need to run ulimit before Percona XtraBackup, or specifying it in the configuration.

  1. There is one more possible limit to overcome. Even running as a root user, you might get the following error message:
root@ts140i:/mnt/data/backup# ulimit -n 2000000
-su: ulimit: open files: cannot modify limit: Operation not permitted

If this happens, you might need to increase the kernel limit on the number of processes any can have:

pz@ts140i:~$ cat /proc/sys/fs/nr_open
1048576

The limit I have on this system is slightly above 1 million. You can increase it using the following:

sysctl -w fs.nr_open=2000000
echo "fs.nr_open=2000000" >> /etc/sysctl.conf

With these configuration adjustments, you should be able to use Percona XtraBackup to backup MySQL instances containing millions of tables without problems.

What if you can’t allow Percona XtraBackup to open that many files? Then there is the option –close-files that won’t normally require increasing the limit to the number of open files. Using this option, however, might cause the backup corruption if you’re doing DDL operations during the backup.

From where does this strange limitation requiring you to keep all tablespaces open come? It comes from this issue. In some cases, DDL operations such as RENAME TABLE might cause the wrong file to be copied, and unable to be caught up by replying to InnoDB redo logs. Keeping the file open clearly shows which file corresponds to a given tablespace at the start of a backup process, and gets handled correctly.

This problem is not unique to Percona XtraBackup. If anything, Percona Xtrabackup goes the extra mile to ensure database backups are safe.  For comparison, MySQL Enterprise Backup 4.0  simply states:  

Do not run the DDL operations ALTER TABLE, TRUNCATE TABLE, OPTIMIZE TABLE, REPAIR TABLE, RESTORE TABLE or CREATE INDEX while a backup operation is going on. The resulting backup might become corrupted.”

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com