Sep
30
2020
--

MySQL 101: Tuning MySQL After Upgrading Memory

Tuning MySQL After Upgrading Memory

Tuning MySQL After Upgrading MemoryIn this post, we will discuss what to do when you add more memory to your instance. Adding memory to a server where MySQL is running is common practice when scaling resources.

First, Some Context

Scaling resources is just adding more resources to your environment, and this can be split in two main ways: vertical scaling and horizontal scaling.

Vertical scaling is increasing hardware capacity for a given instance, thus having a more powerful server, while horizontal scaling is adding more servers, a pretty standard approach for load balancing and sharding.

As traffic grows, working datasets are getting bigger, and thus we start to suffer because the data that doesn’t fit into memory has to be retrieved from disk. This is a costly operation, even with modern NVME drives, so at some point, we will need to deal with either of the scaling solutions we mentioned.

In this case, we will discuss adding more RAM, which is usually the fastest and easiest way to scale hardware vertically, and also having more memory is probably the main benefit for MySQL.

How to Calculate Memory Utilization

First of all, we need to be clear about what variables allocate memory during MySQL operations, and we will cover only commons ones as there are a bunch of them. Also, we need to know that some variables will allocate memory globally, and others will do a per-thread allocation.

For the sake of simplicity, we will cover this topic considering the usage of the standard storage engine: InnoDB.

We have globally allocated variables:

key_buffer_size: MyISAM setting should be set to 8-16M, and anything above that is just wrong because we shouldn’t use MyISAM tables unless for a particular reason. A typical scenario is MyISAM being used by system tables only, which are small (this is valid for versions up to 5.7), and in MySQL 8 system tables were migrated to the InnoDB engine. So the impact of this variable is negligible.

query_cache_size: 0 is default and removed in 8.0, so we won’t consider it.

innodb_buffer_pool_size: which is the cache where InnoDB places pages to perform operations. The bigger, the better. ?

Of course, there are others, but their impact is minimal when running with defaults.

Also, there are other variables that are allocated on each thread (or open connection):
read_buffer_size, read_rnd_buffer_size, sort_buffer_size, join_buffer_size and tmp_table_size, and few others. All of them, by default, work very well as allocation is small and efficient. Hence, the main potential issue becomes where we allocate many connections that can hold these buffers for some time and add extra memory pressure. The ideal situation is to control how many connections are being opened (and used) and try to reduce that number to a sufficient number that doesn’t hurt the application.

But let’s not lose the focus, we have more memory, and we need to know how to tune it properly to make the best usage.

The most memory-impacting setting we need to focus on is innodb_buffer_pool_size, as this is where almost all magic happens and is usually the more significant memory consumer. There is an old rule of thumb that says, “size of this setting should be set around 75% of available memory”, and some cloud vendors setup this value to total_memory*0.75.

I said “old” because that rule was good when running instances with 8G or 16G of RAM was common, so allocating roughly 6G out of 8G or 13G out of 16G used to be logical.

But what if we run into an instance with 100G or even 200G? It’s not uncommon to see this type of hardware nowadays, so we will use 80G out of 100G or 160G out of 200G? Meaning, will we avoid allocating something between 20G to 40G of memory and leave that for filesystem cache operations? While these filesystem operations are not useless, I don’t see OS needing more than 4G-8G for this purpose on a dedicated DB. Also, it is recommended to use the O_DIRECT flushing method for InnoDB to bypass the filesystem cache.

Example

Now that we understand the primary variables allocating memory let’s check a good use case I’m currently working on. Assuming this system:

$ free -m
      total      used     free    shared    buff/cache    available
Mem: 385625    307295    40921         4         37408        74865

So roughly 380G of RAM, a nice amount of memory. Now let’s check what is the maximum potential allocation considering max used connections.

*A little disclaimer here, while this query is not entirely accurate and thus it can diverge from real results, we can have a sense of what is potentially going to be allocated, and we can take advantage of performance_schema database, but this may require enabling some instruments disabled by default:

mysql > show global status like 'max_used_connections';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| Max_used_connections |    67 |
+----------------------+-------+
1 row in set (0.00 sec)

So with a maximum of 67 connections used, we can get:

mysql > SELECT ( @@key_buffer_size
-> + @@innodb_buffer_pool_size
-> + 67 * (@@read_buffer_size
-> + @@read_rnd_buffer_size
-> + @@sort_buffer_size
-> + @@join_buffer_size
-> + @@tmp_table_size )) / (1024*1024*1024) AS MAX_MEMORY_GB;
+---------------+
| MAX_MEMORY_GB |
+---------------+
| 316.4434      |
+---------------+
1 row in set (0.00 sec)

So far, so good, we are within memory ranges, now let’s see how big the innodb_buffer_pool_size is and if it is well sized:

mysql > SELECT (@@innodb_buffer_pool_size) / (1024*1024*1024) AS BUFFER_POOL_SIZE;
+------------------+
| BUFFER_POOL_SIZE |
+------------------+
| 310.0000         |
+------------------+
1 row in set (0.01 sec)

So the buffer pool is 310G, roughly 82% of total memory, and total usage so far was around 84% which leaves us around 60G of memory not being used. Well, being used by filesystem cache, which, in the end, is not used by InnoDB.

Ok now, let’s get to the point, how to properly configure memory to be used effectively by MySQL. From pt-mysql-summary we know that the buffer pool is fully filled:

Buffer Pool Size | 310.0G
Buffer Pool Fill | 100%

Does this mean we need more memory? Maybe, so let’s check how many disk operations we have in an instance we know with a working dataset that doesn’t fit in memory (the very reason why we increased memory size) using with this command:

mysqladmin -r -i 1 -c 60 extended-status | egrep "Innodb_buffer_pool_read_requests|Innodb_buffer_pool_reads"
| Innodb_buffer_pool_read_requests | 99857480858|
| Innodb_buffer_pool_reads         | 598600690  |
| Innodb_buffer_pool_read_requests | 274985     |
| Innodb_buffer_pool_reads         | 1602       |
| Innodb_buffer_pool_read_requests | 267139     |
| Innodb_buffer_pool_reads         | 1562       |
| Innodb_buffer_pool_read_requests | 270779     |
| Innodb_buffer_pool_reads         | 1731       |
| Innodb_buffer_pool_read_requests | 287594     |
| Innodb_buffer_pool_reads         | 1567       |
| Innodb_buffer_pool_read_requests | 282786     |
| Innodb_buffer_pool_reads         | 1754       |

Innodb_buffer_pool_read_requests: page reads satisfied from memory (good)
Innodb_buffer_pool_reads: page reads from disk (bad)

As you may notice, we still get some reads from the disk, and we want to avoid them, so let’s increase the buffer pool size to 340G (90% of total memory) and check again:

mysqladmin -r -i 1 -c 60 extended-status | egrep "Innodb_buffer_pool_read_requests|Innodb_buffer_pool_reads"
| Innodb_buffer_pool_read_requests | 99937722883 |
| Innodb_buffer_pool_reads         | 599056712   |
| Innodb_buffer_pool_read_requests | 293642      |
| Innodb_buffer_pool_reads         | 1           |
| Innodb_buffer_pool_read_requests | 296248      |
| Innodb_buffer_pool_reads         | 0           |
| Innodb_buffer_pool_read_requests | 294409      |
| Innodb_buffer_pool_reads         | 0           |
| Innodb_buffer_pool_read_requests | 296394      |
| Innodb_buffer_pool_reads         | 6           |
| Innodb_buffer_pool_read_requests | 303379      |
| Innodb_buffer_pool_reads         | 0           |

Now we are barely going to disk, and IO pressure was released; this makes us happy –  right?

Summary

If you increase the memory size of a server, you mostly need to focus on innodb_buffer_pool_size, as this is the most critical variable to tune. Allocating 90% to 95% of total available memory on big systems is not bad at all, as OS requires only a few GB to run correctly, and a few more for memory swap should be enough to run without problems.

Also, check your maximum connections required (and used,) as this is a common mistake causing memory issues, and if you need to run with 1000 connections opened, then allocating 90% of the memory of the buffer pool may not be possible, and some additional actions may be required (i.e., adding a proxy layer or a connection pool).

From MySQL 8, we have a new variable called innodb_dedicated_server, which will auto-calculate the memory allocation. While this variable is really useful for an initial approach, it may under-allocate some memory in systems with more than 4G of RAM as it sets the buffer pool size = (detected server memory * 0.75), so in a 200G server, we have only 150 for the buffer pool.

Conclusion

Vertical scaling is the easiest and fastest way to improve performance, and it is also cheaper – but not magical. Tuning variables properly requires analysis and understanding of how memory is being used. This post focused on the essential variables to consider when tuning memory allocation, specifically innodb_buffer_pool_size and max_connections. Don’t over-tune when it’s not necessary and be cautious of how these two affect your systems.

Aug
27
2020
--

More on Checkpoints in InnoDB MySQL 8

Recently I posted about checkpointing in MySQL, where MySQL showed interesting “wave” behavior.

Soon after Dimitri posted a solution with how to fix “waves,” and I would like to dig a little more into proposed suggestions, as there are some materials to process.

This post will be very heavy on InnoDB configuration, so let’s start with the basic configuration for MySQL, but before that some initial environment.

I use MySQL version 8.0.21 on the hardware as described here

As for the storage, I am not using some “old dusty SSD”, but production available Enterprise-Grade Intel SATA SSD D3-S4510. This SSD is able to handle the throughput of 468MiB/sec of random writes or 30000 IOPS of random writes of 16KiB blocks.

So initial configuration for my test was:

[mysqld]
datadir= /data/mysql8-8.0.21
user=mysql
bind_address = 0.0.0.0

socket=/tmp/mysql.sock
log-error=error.log

ssl=0
performance_schema=OFF

skip_log_bin
server_id = 7

# general
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=4000

join_buffer_size=256K
sort_buffer_size=256K

# files
innodb_file_per_table
innodb_log_file_size=10G
innodb_log_files_in_group=2
innodb_open_files=4000

# buffers
innodb_buffer_pool_size= 140G
innodb_buffer_pool_instances=8
innodb_page_cleaners=8
innodb_purge_threads=4
innodb_lru_scan_depth=512
innodb_log_buffer_size=64M

default_storage_engine=InnoDB

innodb_flush_log_at_trx_commit  = 1
innodb_doublewrite= 1
innodb_flush_method             = O_DIRECT
innodb_file_per_table           = 1
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_flush_neighbors=0

#innodb_monitor_enable=all
max_prepared_stmt_count=1000000

innodb_adaptive_hash_index=1
innodb_monitor_enable='%'

innodb-buffer-pool-load-at-startup=OFF
innodb_buffer_pool_dump_at_shutdown=OFF

There is a lot of parameters, so let’s highlight the most relevant for this test:

innodb_buffer_pool_size= 140G

Buffer pool size is enough to fit all data, which is about 100GB in size

innodb_adaptive_hash_index=1

Adaptive hash index is enabled (as it comes in default InnoDB config)

innodb_buffer_pool_instances=8

This is what defaults provide, but I will increase it, following my previous post. 

innodb_log_file_size=10G

innodb_log_files_in_group=2

These parameters define the limit of 20GB for our redo logs, and this is important, as our workload will be “redo-log” bounded, as we will see from the results

innodb_io_capacity=2000

innodb_io_capacity_max=4000

You may ask, why do I use 2000 and 4000, while the storage can handle 30000 IOPS.

This is a valid point, and as we can see later, these parameters are not high enough for this workload, but also it does not mean we should use them all the way up to 30000, as we will see from the results.

MySQL Manual says the following about innodb_io_capacity:

“The innodb_io_capacity variable defines the overall I/O capacity available to InnoDB. It should be set to approximately the number of I/O operations that the system can perform per second (IOPS). When innodb_io_capacity is set, InnoDB estimates the I/O bandwidth available for background tasks based on the set value.” 

From this, you may get the impression that if you set innodb_io_capacity to I/O bandwidth of your storage, you should be fine. Though this part does not say what you should take as I/O operations. For example, if your storage can perform 500MB/sec, then if you do 4KB block IO operations it will be 125000 IO per second, and if you do 16KB IO, then it will be 33000 IO per second. 

MySQL manual leaves it up to your imagination, but as InnoDB typical page size is 16KB, let’s assume we do 16KB blocks IO.

However later on that page, we can read:

“Ideally, keep the setting as low as practical, but not so low that background activities fall behind. If the value is too high, data is removed from the buffer pool and change buffer too quickly for caching to provide a significant benefit. For busy systems capable of higher I/O rates, you can set a higher value to help the server handle the background maintenance work associated with a high rate of row changes”

and

“Consider write workload when tuning innodb_io_capacity. Systems with large write workloads are likely to benefit from a higher setting. A lower setting may be sufficient for systems with a small write workload.”

I do not see that the manual provides much guidance about what value I should use, so we will test it.

Initial results

So if we benchmark with initial parameters, we can see the “wave” pattern.

 

As for why this is happening, let’s check Percona Monitoring and Management “InnoDB Checkpoint Age” chart:

Actually InnoDB Flushing by Type in PMM does not show sync flushing yet, so I had to modify chart a little to show “sync flushing” in orange line:

And we immediately see that Uncheckpointed Bytes exceed Max Checkpoint Age in 16.61GiB, which is defined by 20GiB of innodb log files. 16.61GiB is less than 20GB, because InnoDB reserves some cushion for the cases exactly like this, so even if we exceed 16.61GiB, InnoDB still has an opportunity to flush data.

Also, we see that before Uncheckpointed Bytes exceed Max Checkpoint Age, InnoDB flushes pages with the rate 4000 IOPS, just as defined by innodb_io_capacity_max.

We should try to avoid the case when Uncheckpointed Bytes exceed Max Checkpoint Age, because when it happens, InnoDB gets into “emergency” flushing mode, and in fact, this is what causes the waves we see. I should have detected this in my previous post, mea culpa.

So the first conclusion we can make – if InnoDB does not flush fast enough, what if we increase innodb_io_capacity_max ? Sure, let’s see. And for the simplification, for the next experiments, I will use

Innodb_io_capacity = innodb_io_capacity_max, unless specified otherwise.

Next run with Innodb_io_capacity = innodb_io_capacity_max = 7000

Not much improvement and this also confirmed by InnoDB Checkpoint ge chart

InnoDB tries to flush more pages per second up to 5600 pages/sec, but it is not enough to avoid exceeding Max Checkpoint Age.

Why is this the case? The answer is a double write buffer.

Even though MySQL improved the doublewrite buffer in MySQL 8.0.20, it does not perform well enough with proposed defaults. 

Well, at least the problem was solved because previous Oracle ran benchmarks with disabled doublewrite, just to hide and totally ignore the issue with doublewrite. For the example check this.

But let’s get back to our 8.0.21 and fixed doublewrite.

Dimiti mentions:

“the main config options for DBLWR in MySQL 8.0 are:

  • innodb_doublewrite_files = N
    innodb_doublewrite_pages = M”

Let’s check the manual again:

“The innodb_doublewrite_files variable is intended for advanced performance tuning. The default setting should be suitable for most users.

innodb_doublewrite_pages

The innodb_doublewrite_pages variable (introduced in MySQL 8.0.20) controls the number of maximum number of doublewrite pages per thread. If no value is specified, innodb_doublewrite_pages is set to the innodb_write_io_threads value. This variable is intended for advanced performance tuning. The default value should be suitable for most users.

Was it wrong to assume that innodb_doublewrite_files and  innodb_doublewrite_pages provides the value suitable for our use case?

But let’s try with the values Dmitri recommended to look into, I will use

innodb_doublewrite_files=2 and innodb_doublewrite_pages=128

Results with innodb_doublewrite_files=2 and innodb_doublewrite_pages=128

The problem with waves is fixed! 

And InnoDB Checkpoint Age chart:

Now we are able to keep Uncheckpointed Bytes under Max Checkpoint Age, and this is what fixed “waves” pattern.

We can say that parallel doublewrite is a new welcomed improvement, but the fact that one has to change innodb_doublewrite_pages in order to get improved performance is the design flaw in my opinion.

But there are still a lot of variations in 1 sec resolution and small drops. Before we get to them, let’s take a look at another suggestion: use –innodb_adaptive_hash_index=0 ( that is to disable Adaptive Hash Index). I will use AHI=0 on the charts to mark this setting.

Let’s take a look at the results with improved settings and with –innodb_adaptive_hash_index=0

Results with –innodb_adaptive_hash_index=0

To see what is the real improvement with –innodb_adaptive_hash_index=0 , let’s compare barcharts:

Or in numeric form:

settings Avg tps, last 2000 sec
io_cap_max=7000,doublewrite=opt 7578.69
io_cap_max=7000,doublewrite=opt,AHI=0 7996.33

So –innodb_adaptive_hash_index=0 really brings some improvements, about 5.5%, so I will use  –innodb_adaptive_hash_index=0 for further experiments.

Let’s see if increased innodb_buffer_pool_instances=32 will help to smooth periodical variance.

Results with innodb_buffer_pool_instances=32

So indeed using innodb_buffer_pool_instances=32 gets us less variations, keeping overall throughput about the same. It is 7936.28 tps for this case.

Now let’s review the parameter innodb_change_buffering=none, which Dmitri also suggests.

Results with innodb_change_buffering=none

There is NO practical difference if we disable innodb_change_buffer.

And if we take a look at PMM change buffer chart:

We can see there is NO Change Buffer activity outside of the initial 20 mins. I am not sure why Dimitri suggested disabling it. In fact, Change Buffer can be quite useful, and I will show it in my benchmark for the different workloads.

Now let’s take a look at suggested settings with Innodb_io_capacity = innodb_io_capacity_max = 8000. That will INCREASE innodb_io_capacity_max , and compare to results with innodb_io_capacity_max = 7000.

Or in tabular form:

settings Avg tps, last 2000 sec
io_cap_max=7000,doublewrite=opt,AHI=0,BPI=32 7936.28
io_cap_max=8000,doublewrite=opt,AHI=0,BPI=32 7693.08

Actually with innodb_io_capacity_max=8000 the throughput is LESS than with  innodb_io_capacity_max=7000

Can you guess why? 

Let’s compare InnoDB Checkpoint Age.

This is for innodb_io_capacity_max=8000 :

And this is for innodb_io_capacity_max=7000 

This is like a child’s game: Find the difference.

The difference is that with  innodb_io_capacity_max=7000
Uncheckpointed Bytes is 13.66 GiB,
and with innodb_io_capacity_max=8000
Uncheckpointed Bytes is 12.51 GiB

What does it mean? It means that with innodb_io_capacity_max=7000 HAS to flush LESS pages and still keep within Max Checkpoint Age.

In fact, if we try to push even further, and use innodb_io_capacity_max=innodb_io_capacity=6500 we will get InnoDB Checkpoint Age chart as:

Where Uncheckpointed Bytes are 15.47 GiB. Does it improve throughput? Absolutely!

settings Avg tps, last 2000 sec
io_cap_max=6500,doublewrite=opt,AHI=0,BPI=32 8233.628
io_cap_max=7000,doublewrite=opt,AHI=0,BPI=32 7936.283
io_cap_max=8000,io_cap_max=8000,doublewrite=opt,AHI=0,BPI=32 7693.084

The difference between innodb_io_capacity_max=6500 and innodb_io_capacity_max=8000 is 7%

This now becomes clear what Manual means in the part where it says:

“Ideally, keep the setting as low as practical, but not so low that background activities fall behind”

So we really need to increase innodb_io_capacity_max to the level that Uncheckpointed Bytes stays under Max Checkpoint Age, but not by much, otherwise InnoDB will do more work then it is needed and it will affect the throughput.

In my opinion, this is a serious design flaw in InnoDB Adaptive Flushing, that you actually need to wiggle innodb_io_capacity_max to achieve appropriate results.

Inverse relationship between innodb_io_capacity_max and innodb_log_file_size

To show an even more complicated relation between innodb_io_capacity_max and innodb_log_file_size, let consider the following experiment.

We will increase innodb_log_file_size from 10GB to 20GB, effectively doubling our redo-log capacity.

And now let’s check InnoDB Checkpoint Age with innodb_io_capacity_max=7000:

We can see there is a lot of space in InnoDB logs which InnoDB does not use. There is only 22.58GiB of Uncheckpointed Bytes, while 33.24 GiB are available.

So what happens if we increase innodb_io_capacity_max to 4500

 InnoDB Checkpoint Age with innodb_io_capacity_max=4500:

In this setup, We can push Uncheckpointed Bytes to 29.80 GiB, and it has a positive effect on the throughput.

Let’s compare throughput :

settings Avg tps, last 2000 sec
io_cap_max=4500,log_size=40GB,doublewrite=opt,AHI=0,BPI=32 9865.308
io_cap_max=7000,log_size=40GB,doublewrite=opt,AHI=0,BPI=32 9374.121

So by decreasing innodb_io_capacity_max from 7000 to 4500 we can gain 5.2% in the throughput.

Please note that we can’t continue to decrease innodb_io_capacity_max, because in this case Uncheckpointed Bytes risks to exceed Max Checkpoint Age, and this will lead to the negative effect of emergency flushing.

So again, in order to improve throughput, we should be DECREASING innodb_io_capacity_max, but only to a certain threshold. We should not be setting innodb_io_capacity_max to 30000, to what really SATA SSD can provide.

Again, for me, this is a major design flaw in the current InnoDB Adaptive Flushing. Please note this was a static workload. If your workload changes during the day, it is practically impossible to come up with optimal value. 

Conclusions:

Trying to summarize all of the above, I want to highlight:

  • To fix “wave” pattern we need to tune innodb_io_capacity_max and innodb_doublewrite_pages 
  • InnoDB parallel doublewrite in MySQL 8.0.20 is a definitely positive improvement, but the default values seem chosen poorly, in contradiction with Manual. I wish Oracle/MySQL shipped features that work out of the box for most users.
  • InnoDB Adaptive Hash index is not helping here, and you get better performance by disabling it. I also observed that in other workloads, the InnoDB Adaptive Hash index might be another broken subsystem, which Oracle ignores to fix and just disables it in its benchmarks.
  • InnoDB Change Buffer has no effect on this workload, so you may or may not disable it — there is no difference. But I saw a positive effect from InnoDB Change Buffer in other workloads, so I do not recommend blindly disabling it.
  • Now about InnoDB Adaptive Flushing. In my opinion, InnoDB Adaptive Flushing relies too much on manual tuning of innodb_io_capacity_max , which in fact has nothing to do with the real storage IO capacity. In fact, often you need to lower innodb_io_capacity_max  to get better performance, but not make it too low, because at some point it will hurt the performance. The best way to monitor it is to check InnoDB Checkpoint Age chart in PMM
  • I would encourage Oracle to fix the broken design of InnoDB Adaptive Flushing, where it would detect IO capacity automatically and to not flush aggressively, but to keep  Uncheckpointed Bytes just under Max Checkpoint Age. Let’s hope Oracle faster than doublewrite buffer because history shows that to force Oracle to make improvements in InnoDB IO subsystem, we need to do it first in Percona Server for MySQL like we did with parallel doublewrite buffer.  For the reference parallel doublewrite was implemented first in Percona Server for MySQL 5.7.11-4 which was released March 15th, 2016. Oracle implemented (with not optimal default settings ) parallel doublewrite in MySQL 8.0.20, which was released 4 years later after Percona Server, on April 4th, 2020.
Aug
27
2020
--

MySQL 8.0.19 InnoDB ReplicaSet Configuration and Manual Switchover

Manual Switchover

Manual SwitchoverInnoDB ReplicaSet was introduced from MySQL 8.0.19. It works based on the MySQL asynchronous replication. Generally, InnoDB ReplicaSet does not provide high availability on its own like InnoDB Cluster, because with InnoDB ReplicaSet we need to perform the manual failover. AdminAPI includes the support for the InnoDB ReplicaSet. We can operate the InnoDB ReplicaSet using the MySQL shell. 

  • InnoDB cluster is the combination of MySQL shell and Group replication and MySQL router
  • InnoDB ReplicaSet is the combination of MySQL shell and MySQL traditional async replication and MySQL router

Why InnoDB ReplicaSet?

  • You can manually perform the switchover and failover with InnoDB ReplicaSet
  • You can easily add the new node to your replication environment. InnoDB ReplicaSet helps with data provisioning (using MySQL clone plugin) and setting up the replication.

In this blog, I am going to explain the process involved in the following topics

  • How to set up the InnoDB ReplicaSet in a fresh environment?
  • How to perform the manual switchover with ReplicaSet?

Before going into the topic, I am summarising the points which should be made aware to work on InnoDB ReplicaSet. 

  • ReplicaSet only supports GTID based replication environments. 
  • MySQL version should be 8.x +.
  • It has support for only Row-based replication.
  • Replication filters are not supported with InnoDB ReplicaSet
  • InnoDB ReplicaSet should have one primary node ( master ) and one or multiple secondary nodes ( slaves ). All the secondary nodes should be configured under the primary node. 
  • There is no limit for secondary nodes, you can configure many nodes under ReplicaSet.
  • It supports only manual failover.
  • InnoDB ReplicaSet should be completely managed with MySQL shell. 

How to set up the InnoDB ReplicaSet in a fresh environment?

I have created two servers (replicaset1, replicaset2) for testing purposes. My goal is to create the InnoDB ReplicaSet with one primary node and one secondary node. I installed Percona Server for MySQL 8.0.20 for my testing.

Step 1 :

Allow hostname based communication. Make sure that you configured this on all the servers, which participated in the ReplicaSet.

#vi /etc/hosts
172.28.128.20 replicaset1 replicaset1
172.28.128.21 replicaset2 replicaset2

Step 2 :

In this step, I am going to prepare the MySQL instances for InnoDB ReplicaSet. Below are the major tasks that need to be performed as part of this operation.

  • Create a dedicated user account to effectively manage the ReplicaSet. The account will be automatically created with sufficient privileges.
  • MySQL parameters changes which need to be updated for InnoDB ReplicaSet (persisting settings).
  • Restart the MySQL instance to apply the changes.

Command : dba.configureReplicaSetInstance()

Connecting the shell,

[root@replicaset1 ~]# mysqlsh --uri root@localhost
Please provide the password for 'root@localhost': *************
Save password for 'root@localhost'? [Y]es/[N]o/Ne[v]er (default No): y
MySQL Shell 8.0.20

Configuring the instance,

Once you triggered the command, it will start to interact with you. You have to choose the needed options. 

MySQL  localhost:33060+ ssl  JS > dba.configureReplicaSetInstance()
Configuring local MySQL instance listening at port 3306 for use in an InnoDB ReplicaSet...
This instance reports its own address as replicaset1:3306
Clients and other cluster members will communicate with it through this address by default. If this is not correct, thereport_host MySQL system variable should be changed.
ERROR: User 'root' can only connect from 'localhost'. New account(s) with proper source address specification to allow remote connection from all instances must be created to manage the cluster.

1) Create remotely usable account for 'root' with same grants and password
2) Create a new admin account for InnoDB ReplicaSet with minimal required grants
3) Ignore and continue
4) Cancel

Please select an option [1]: 2
Please provide an account name (e.g: icroot@%) to have it created with the necessary
privileges or leave empty and press Enter to cancel.
Account Name: InnodbReplicaSet
Password for new account: ********
Confirm password: ********

NOTE: Some configuration options need to be fixed:
+--------------------------+---------------+----------------+--------------------------------------------------+
| Variable                 | Current Value | Required Value | Note                                             |
+--------------------------+---------------+----------------+--------------------------------------------------+
| enforce_gtid_consistency | OFF           | ON             | Update read-only variable and restart the server |
| gtid_mode                | OFF           | ON             | Update read-only variable and restart the server |
| server_id                | 1             | <unique ID>    | Update read-only variable and restart the server |
+--------------------------+---------------+----------------+--------------------------------------------------+
Some variables need to be changed, but cannot be done dynamically on the server.
Do you want to perform the required configuration changes? [y/n]: y
Do you want to restart the instance after configuring it? [y/n]: y
Cluster admin user 'InnodbReplicaSet'@'%' created.
Configuring instance...
The instance 'replicaset1:3306' was configured to be used in an InnoDB ReplicaSet.
Restarting MySQL...
NOTE: MySQL server at replicaset1:3306 was restarted.

You can find the updated parameters from the file “mysqld-auto.cnf”. The blog by Marco Tusa has more details about the PERSIST configuration. 

[root@replicaset1 mysql]# cat mysqld-auto.cnf 

{ "Version" : 1 , "mysql_server" : { "server_id" : { "Value" : "3391287398" , "Metadata" : { "Timestamp" : 1598084590766958 , "User" : "root" , "Host" : "localhost" } } , "read_only" : { "Value" : "OFF" , "Metadata" : { "Timestamp" : 1598084718849667 , "User" : "InnodbReplicaSet" , "Host" : "localhost" } } , "super_read_only" : { "Value" : "ON" , "Metadata" : { "Timestamp" : 1598084898510380 , "User" : "InnodbReplicaSet" , "Host" : "localhost" } } , "mysql_server_static_options" : { "enforce_gtid_consistency" : { "Value" : "ON" , "Metadata" : { "Timestamp" : 1598084590757563 , "User" : "root" , "Host" : "localhost" } } , "gtid_mode" : { "Value" : "ON" , "Metadata" : { "Timestamp" : 1598084590766121 , "User" : "root" , "Host" : "localhost" } } } } }

Note :

  • Make sure that this step is executed on all the MySQL instances which are going to participate in the ReplicaSet group.
  • Make sure that the cluster account name and password are the same on all MySQL instances.

Step 3 :

In this step, I am going to switch my login to the ReplicaSet account which was created in Step 2. 

MySQL  localhost:33060+ ssl  JS > \connect InnodbReplicaSet@replicaset1
Creating a session to 'InnodbReplicaSet@replicaset1'
Please provide the password for 'InnodbReplicaSet@replicaset1': ********
Save password for 'InnodbReplicaSet@replicaset1'? [Y]es/[N]o/Ne[v]er (default No): y
Fetching schema names for autocompletion... Press ^C to stop.
Closing old connection...
Your MySQL connection id is 8 (X protocol)
Server version: 8.0.20-11 Percona Server (GPL), Release 11, Revision 5b5a5d2

Step 4:

Now, all are set to create the ReplicaSet.

Command : dba.createReplicaSet(‘<ReplicaSet Name>’)

MySQL  replicaset1:33060+ ssl  JS > dba.createReplicaSet('PerconaReplicaSet')
A new replicaset with instance 'replicaset1:3306' will be created.

* Checking MySQL instance at replicaset1:3306
This instance reports its own address as replicaset1:3306
replicaset1:3306: Instance configuration is suitable.

* Updating metadata...
ReplicaSet object successfully created for replicaset1:3306.
Use rs.addInstance() to add more asynchronously replicated instances to this replicaset and rs.status() to check its status.
<ReplicaSet:PerconaReplicaSet>

ReplicaSet is created with the name “PerconaReplicaSet”

Step 5:

In this step, I am going to assign the ReplicaSet to the variable and check the ReplicaSet status. Assigning to the variable can be done while creating the ReplicaSet as well  (i.e. var replicaset = dba.createReplicaSet(‘<ReplicaSet Name>’)  

MySQL  replicaset1:33060+ ssl  JS > replicaset = dba.getReplicaSet()
You are connected to a member of replicaset 'PerconaReplicaSet'.
<ReplicaSet:PerconaReplicaSet>
 MySQL  replicaset1:33060+ ssl  JS > 
 MySQL  replicaset1:33060+ ssl  JS > replicaset.status()
{
    "replicaSet": {
        "name": "PerconaReplicaSet", 
        "primary": "replicaset1:3306", 
        "status": "AVAILABLE", 
        "statusText": "All instances available.", 
        "topology": {
            "replicaset1:3306": {
                "address": "replicaset1:3306", 
                "instanceRole": "PRIMARY", 
                "mode": "R/W", 
                "status": "ONLINE"
            }
        }, 
        "type": "ASYNC"
    }
}

The ReplicaSet status states the Instance replicaset1 is operational and is the PRIMARY member. 

Step 6:

Now, I need to add the secondary instance “replicaset2” to the ReplicaSet.

When adding the new instance, it should be fulfilled with all the ReplicaSet requirements. We have two recovery methods when joining the new node.

Clone: It will take the snapshot from the ONLINE instance and build the target node with a snapshot and finally add to the ReplicaSet. This method is always recommended when adding fresh nodes.

Incremental: This method relies on MySQL replication and applies all the transactions which are missed on the new instance. This can be faster when the missing transaction amount is small.

Command : replicaset.addInstance(‘<instance name>:<port>’)

MySQL  replicaset1:33060+ ssl  JS > replicaset.addInstance('replicaset2:3306')
Adding instance to the replicaset...
* Performing validation checks
This instance reports its own address as replicaset2:3306
replicaset2:3306: Instance configuration is suitable.
* Checking async replication topology...
* Checking transaction state of the instance...

NOTE: The target instance 'replicaset2:3306' has not been pre-provisioned (GTID set is empty). The Shell is unable to decide whether replication can completely recover its state.
The safest and most convenient way to provision a new instance is through automatic clone provisioning, which will completely overwrite the state of 'replicaset2:3306' with a physical snapshot from an existing replicaset member. To use this method by default, set the 'recoveryMethod' option to 'clone'.

WARNING: It should be safe to rely on replication to incrementally recover the state of the new instance if you are sure all updates ever executed in the replicaset were done with GTIDs enabled, there are no purged transactions and the new instance contains the same GTID set as the replicaset or a subset of it. To use this method by default, set the 'recoveryMethod' option to 'incremental'.
Please select a recovery method [C]lone/[I]ncremental recovery/[A]bort (default Clone): C
* Updating topology
Waiting for clone process of the new member to complete. Press ^C to abort the operation.
* Waiting for clone to finish...
NOTE: replicaset2:3306 is being cloned from replicaset1:3306
** Stage DROP DATA: Completed
** Clone Transfer  
    FILE COPY  ############################################################  100%  Completed
    PAGE COPY  ############################################################  100%  Completed
    REDO COPY  ############################################################  100%  Completed
** Stage RECOVERY: |
NOTE: replicaset2:3306 is shutting down...

* Waiting for server restart... ready
* replicaset2:3306 has restarted, waiting for clone to finish...
* Clone process has finished: 60.68 MB transferred in about 1 second (~60.68 MB/s)
** Configuring replicaset2:3306 to replicate from replicaset1:3306
** Waiting for new instance to synchronize with PRIMARY...
The instance 'replicaset2:3306' was added to the replicaset and is replicating from replicaset1:3306.

Here I have chosen the clone method for recovery. 

MySQL  replicaset1:33060+ ssl  JS > replicaset.status()
{
    "replicaSet": {
        "name": "PerconaReplicaSet", 
        "primary": "replicaset1:3306", 
        "status": "AVAILABLE", 
        "statusText": "All instances available.", 
        "topology": {
            "replicaset1:3306": {
                "address": "replicaset1:3306", 
                "instanceRole": "PRIMARY", 
                "mode": "R/W", 
                "status": "ONLINE"
            }, 
            "replicaset2:3306": {
                "address": "replicaset2:3306", 
                "instanceRole": "SECONDARY", 
                "mode": "R/O", 
                "replication": {
                    "applierStatus": "APPLIED_ALL", 
                    "applierThreadState": "Slave has read all relay log; waiting for more updates", 
                    "receiverStatus": "ON", 
                    "receiverThreadState": "Waiting for master to send event", 
                    "replicationLag": null
                }, 
                "status": "ONLINE"
            }
        }, 
        "type": "ASYNC"
    }
}

The second instance has been added to the ReplicaSet. 

How to perform the manual switchover with ReplicaSet? 

As per the current topology,

  • replicaset1 is the PRIMARY
  • replicaset2 is the SECONDARY

Requirement: Regarding the maintenance activity, I am planning to remove the server “replicaset1” from the ReplicaSet. This needs to be performed in a safe manner and the secondary instance “replicaset2” should be available for application writes and reads.

  • First, I need to promote “replicaset2” as the PRIMARY.
  • Then, remove the “replicaset1” from the group.

Switching the “replicaset2” as the PRIMARY.

Command : replicaset.setPrimaryInstance(‘host:port’)

MySQL  replicaset1:33060+ ssl  JS > replicaset.setPrimaryInstance('replicaset2:3306')
replicaset2:3306 will be promoted to PRIMARY of 'PerconaReplicaSet'.
The current PRIMARY is replicaset1:3306.

* Connecting to replicaset instances
** Connecting to replicaset1:3306
** Connecting to replicaset2:3306
** Connecting to replicaset1:3306
** Connecting to replicaset2:3306
* Performing validation checks
** Checking async replication topology...
** Checking transaction state of the instance...
* Synchronizing transaction backlog at replicaset2:3306
* Updating metadata
* Acquiring locks in replicaset instances
** Pre-synchronizing SECONDARIES
** Acquiring global lock at PRIMARY
** Acquiring global lock at SECONDARIES
* Updating replication topology
** Configuring replicaset1:3306 to replicate from replicaset2:3306
replicaset2:3306 was promoted to PRIMARY.

You can see the “replicaset2” has been promoted as PRIMARY.

MySQL  replicaset1:33060+ ssl  JS > replicaset.status()
{
    "replicaSet": {
        "name": "PerconaReplicaSet", 
        "primary": "replicaset2:3306", 
        "status": "AVAILABLE", 
        "statusText": "All instances available.", 
        "topology": {
            "replicaset1:3306": {
                "address": "replicaset1:3306", 
                "instanceRole": "SECONDARY", 
                "mode": "R/O", 
                "replication": {
                    "applierStatus": "APPLIED_ALL", 
                    "applierThreadState": "Slave has read all relay log; waiting for more updates", 
                    "receiverStatus": "ON", 
                    "receiverThreadState": "Waiting for master to send event", 
                    "replicationLag": null
                }, 
                "status": "ONLINE"
            }, 
            "replicaset2:3306": {
                "address": "replicaset2:3306", 
                "instanceRole": "PRIMARY", 
                "mode": "R/W", 
                "status": "ONLINE"
            }
        }, 
        "type": "ASYNC"
    }
}

Removing “replicaset1” from the group,

Command : replicaset.removeInstance(‘host:port’)

MySQL  replicaset1:33060+ ssl  JS > replicaset.removeInstance('replicaset1:3306')
The instance 'replicaset1:3306' was removed from the replicaset.
MySQL  replicaset1:33060+ ssl  JS > 
MySQL  replicaset1:33060+ ssl  JS > replicaset.status()
{
    "replicaSet": {
        "name": "PerconaReplicaSet", 
        "primary": "replicaset2:3306", 
        "status": "AVAILABLE", 
        "statusText": "All instances available.", 
        "topology": {
            "replicaset2:3306": {
                "address": "replicaset2:3306", 
                "instanceRole": "PRIMARY", 
                "mode": "R/W", 
                "status": "ONLINE"
            }
        }, 
        "type": "ASYNC"
    }
}

We can perform the forced failover using “ReplicaSet.forcePrimaryInstance()”. This is dangerous and only recommended to use on the disaster type of scenario.

MySQL InnoDB ReplicaSet is a very good feature to manage the MySQL asynchronous replication environment. It has the CLONE plugin support and it greatly helps on data provisioning and setting up the replication. But, still it has some limitations when compared with the MySQL InnoDB Cluster. 

Aug
14
2020
--

Part Two: How Many innodb_buffer_pool_instances Do You Need in MySQL 8 With a CPU-Bound Workload?

how many innodb_buffer_pool_instances Do You Need in MySQL 8

how many innodb_buffer_pool_instances Do You Need in MySQL 8Part one of this series can be found here: How Many innodb_buffer_pool_instances Do You Need in MySQL 8?

Following up on my recent benchmark posts on MySQL and MariaDB, MySQL and MariaDB on Enterprise SSD Storage and How MySQL and MariaDB Perform on NVMe Storage, I wanted to dig a little deeper and understand how different MySQL parameters affect performance.

One of the obscure MySQL Parameters (in my opinion) is innodb_buffer_pool_instances. In particular, I do not have a feel for how many you need and how to understand if a chosen value is a good or a bad one.

In our articles MySQL 5.7 Performance Tuning After Installation and MySQL 101: Parameters to Tune for MySQL Performance we recommended to use value “8”, but I can’t really say if this is a good enough value. This post is a follow-up to my previous article about results we got with different innodb_buffer_pool instances with an IO-bound scenario.

So let’s see what result we will get with different innodb_buffer_pool instances, under the following scenario. I will use sysbench oltp_read_write benchmark with pareto distribution and set innodb_buffer_pool_size=140GB for the database in size 100GB, so this will be a CPU-bound workload.

Benchmark

The hardware I use is:

System | Supermicro; SYS-F619P2-RTN; v0123456789 (Other)
   Platform | Linux
    Release | Ubuntu 18.04.4 LTS (bionic)
     Kernel | 5.3.0-42-generic
Architecture | CPU = 64-bit, OS = 64-bit
  Threading | NPTL 2.27
    SELinux | No SELinux detected
Virtualized | No virtualization detected
# Processor ##################################################
 Processors | physical = 2, cores = 40, virtual = 80, hyperthreading = yes
     Models | 80xIntel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
     Caches | 80x28160 KB
# Memory #####################################################
      Total | 187.6G

With the storage on  SATA SSD INTEL SSDSC2KB960G8 (Intel Enterprise-grade SSD D3-S4510).

A short settings overview:

  • Data fits into memory (The datasize is ~100GB, memory on the server is 188GB, and we allocate 140GB for MySQL innodb_buffer_pool_size).
  • The workload on storage will be very read-write-intensive (reads are done from the storage), with full ACID-compliant and data safe settings in MySQL.
  • For SATA SSD storage  innodb_io_capacity I will use 2000 and innodb_io_capacity_max = 4000 .
  • I will test the following values for innodb_buffer_pool_instances: 1 2 4 8 16 32 64.
  • Innodb_buffer_pool_instances = 64 is the maximum value allowed by MySQL.

The benchmark command line is:

sysbench oltp_read_write --threads=150 --time=10000 --tables=40 --table_size=10000000 --mysql-host=127.0.0.1 --mysql-user=sbtest --mysql-password=sbtest --max-requests=0 --report-interval=1 --mysql-db=sbtest --mysql-ssl=off --create_table_options=DEFAULT CHARSET=utf8mb4 --report_csv=yes --rand-type=pareto run

This means that the benchmark will run for three hours with reporting throughput every 1 sec. I will use a three-hour time frame for the reason which will be apparent later from the results.

Results on SATA SSD

Let’s see what results I’ve got individually for each of innodb_buffer_pool_instances:

innodb_buffer_pool_instances=1

innodb_buffer_pool_instances=1

innodb_buffer_pool_instances=2

innodb_buffer_pool_instances=2

innodb_buffer_pool_instances=4

innodb_buffer_pool_instances=4

innodb_buffer_pool_instances=8

innodb_buffer_pool_instances=8

innodb_buffer_pool_instances=16

innodb_buffer_pool_instances=16

innodb_buffer_pool_instances=32

innodb_buffer_pool_instances=32

innodb_buffer_pool_instances=64

innodb_buffer_pool_instances=64

What seems apparent is that as we increase innodb_buffer_pool_instances, it has a positive effect on variation in the throughput. We can condense the results in a single chart to see it closely:

increase innodb_buffer_pool_instances

And if we want to compare the throughput and deviation, let’s compare the results for the last 2500 sec:

compare the throughput and deviation

So actually, innodb_buffer_pool_instances=64 showed the best throughput and the less variability, as even with innodb_buffer_pool_instances=32 there are noticeable stalls.

With innodb_buffer_pool_instances from 1 to 8, there are regular stalls, including drops to 0 tps for a few seconds.

Increasing innodb_buffer_pool_instances over 64 seems to have no effect.

Final Thoughts

For this particular case, innodb_buffer_pool_instances=64 was the best choice, but I still can’t recommend a reliable way to find what the optimal value is. Keep in mind that this particular setting depends on the hardware (I tested on the hardware with 80 CPU cores) and on the workload.

Part one of this series can be found here: How Many innodb_buffer_pool_instances Do You Need in MySQL 8?

Aug
13
2020
--

How Many innodb_buffer_pool_instances Do You Need in MySQL 8?

Following up on my recent benchmark posts on MySQL and MariaDB, MySQL and MariaDB on Enterprise SSD Storage and How MySQL and MariaDB Perform on NVMe Storage, I wanted to dig a little deeper and understand how different MySQL parameters affect performance.

One of the obscure MySQL Parameters (in my opinion) is innodb_buffer_pool_instances. In particular, I do not have a feel for how many you need and how to understand if a chosen value is a good or a bad one.

In our articles MySQL 5.7 Performance Tuning After Installation and MySQL 101: Parameters to Tune for MySQL Performance we recommended to use value “8”, but I can’t really say if this is a good enough value. So let’s see what results we will get with different innodb_buffer_pool instances, under the following scenario:

I will use the sysbench oltp_read_write benchmark with pareto distribution. I will set innodb_buffer_pool_size=25GB for a database 100 GB in size, so there will be a competition for buffer_pool space and this will be an IO-bound scenario.

Benchmark

The hardware I use is:

System | Supermicro; SYS-F619P2-RTN; v0123456789 (Other)
   Platform | Linux
    Release | Ubuntu 18.04.4 LTS (bionic)
     Kernel | 5.3.0-42-generic
Architecture | CPU = 64-bit, OS = 64-bit
  Threading | NPTL 2.27
    SELinux | No SELinux detected
Virtualized | No virtualization detected
# Processor ##################################################
 Processors | physical = 2, cores = 40, virtual = 80, hyperthreading = yes
     Models | 80xIntel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
     Caches | 80x28160 KB
# Memory #####################################################
      Total | 187.6G

With the storage on SATA SSD INTEL SSDSC2KB960G8 (Intel Enterprise-grade SSD D3-S4510).

The short settings overview:

  • Data does not fit into memory (the datasize is ~100GB, memory on the server is 188GB, and we allocate 25GB for MySQL innodb_buffer_pool_size using O_DIRECT, so even there is a lot of memory on the server, it is not used over specified 25GB).
  • The workload on storage will be very read-write-intensive (reads will be done from the storage), with full ACID-compliant and data safe settings in MySQL.
  • For SATA SSD storage  innodb_io_capacity, I will use 2000 and innodb_io_capacity_max = 4000.
  • I will test the following values for innodb_buffer_pool_instances: 1, 2, 4, 8, 16, 32, 64.
  • Innodb_buffer_pool_instances = 64 is the maximum value allowed by MySQL.

The benchmark command line is:

sysbench oltp_read_write --threads=150 --time=10000 --tables=40 --table_size=10000000 --mysql-host=127.0.0.1 --mysql-user=sbtest --mysql-password=sbtest --max-requests=0 --report-interval=1 --mysql-db=sbtest --mysql-ssl=off --create_table_options=DEFAULT CHARSET=utf8mb4 --report_csv=yes --rand-type=pareto run

This means that the benchmark will run for three hours with reporting throughput every 1 sec. I will use a three-hour time frame for a reason which will be apparent later from the results.

Results on SATA SSD

Let’s see what results I’ve got individually for each of innodb_buffer_pool_instances:

innodb_buffer_pool_instances=1

innodb_buffer_pool_instances=1

innodb_buffer_pool_instances=2

innodb_buffer_pool_instances=2

innodb_buffer_pool_instances=4

innodb_buffer_pool_instances=4

innodb_buffer_pool_instances=8

innodb_buffer_pool_instances=8

innodb_buffer_pool_instances=16

innodb_buffer_pool_instances=16

innodb_buffer_pool_instances=32

innodb_buffer_pool_instances=32

innodb_buffer_pool_instances=64

innodb_buffer_pool_instances=64

What seems apparent is that as we increase innodb_buffer_pool_instances, it has a positive effect on variation in the throughput. We can condense the results in a single chart to see it closely:

innodb_buffer_pool_instances chart

And if we want to compare the throughput and deviation, let’s compare the results for the last 2500 sec:

So actually, innodb_buffer_pool_instances=64 showed the best throughput and less variability. Recommended innodb_buffer_pool_instances=8 seems better compared to 1-4 values in a sense of variability, but it does not produce the best throughput.

Final Thoughts

For this particular case, innodb_buffer_pool_instances=64 was the best choice, but I still can’t recommend a reliable way to find what the optimal value is. Small values 1-4 seems to produce a lot of variability or even stalls, so starting from 8 is a good choice.

Part Two of this series can be found here: How Many innodb_buffer_pool_instances Do You Need in MySQL 8 With a CPU-Bound Workload?

Sep
04
2019
--

Simulating InnoDB Secondary Index Corruption

InnoDB Secondary Index Corruption

InnoDB Secondary Index CorruptionWorking as a support engineer here at Percona is amazing, as you get a variety of requests, ranging from the most trivial questions to questions that require source code review and understanding of the internals of InnoDB, for example.

In our world where High Availability is a must, everything is about being pro-active, and when we need to be reactive we must act fast. To do so we need to ensure we have proper monitoring in place that covers all possible failure scenarios. Unfortunately, that is not always possible and we are always improving and learning as we face new types of issues.

A few days ago one of our customers faced an issue where MySQL identified an InnoDB secondary index corruption and marked that table as corrupted instead of crashing the server. Even though one would think that a single table marked as corrupted is not as bad as an entire server crash, this particular table was key for the whole application, which in the end caused an outage anyway. They wanted to artificially simulate this situation in order to improve their QA / Monitoring.

The Error

2019-08-26T15:47:10.526458Z 2 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace test/tb1 page [page id: space=48, page number=8]. You may have to recover from a backup.
2019-08-26T15:47:10.526465Z 2 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex dc67f35500000008ffffffff000000090000000000 ... 
InnoDB: End of page dump
2019-08-26T15:47:10.594440Z 2 [Note] InnoDB: Uncompressed page, stored checksum in field1 3697800021, calculated checksums for field1: crc32 3697800020/786197600, innodb 812191569, none 3735928559, stored checksum in field2 3697800020, calculated checksums for field2: crc32 3697800020/786197600, innodb 1337770384, none 3735928559,  page LSN 0 3082690, low 4 bytes of LSN at page end 3082690, page number (if stored to page already) 8, space id (if created with >= MySQL-4.1.1 and stored already) 48
InnoDB: Page may be an index page where index id is 49
2019-08-26T15:47:10.594466Z 2 [Note] InnoDB: Index 49 is `b` in table `test`.`tb1`
2019-08-26T15:47:10.594474Z 2 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
2019-08-26T15:47:10.594571Z 2 [ERROR] Got error 126 when reading table './test/tb1'

InnoDB will retry the read 100 times, and if it fails on all of the attempts, the server will assert – otherwise the behavior will vary depending on the version you are using. On 5.5 it will consider the table as corrupted while on 5.7 it will mark the table missing on the data dictionary. Either way, from this moment forward, all queries on that particular table will fail.

InnoDB Internals

InnoDB has a defined layout for it’s in disk structures (redo log, .ibd files, …).

If you are interested in learning or poking around, Jeremy Cole has a project to diagram most of the internal InnoDB structures at https://github.com/jeremycole/innodb_diagrams and also a powerful tool to extract data from innodb at https://github.com/jeremycole/innodb_ruby. The intention of this post is to show you how you can do it manually, so you can get more exposure to internals.

To simulate the corruption we will need to read only a small part of each page header. Here is the list and it’s location as per 5.7 source code:

  • Each page is defined by a type – FIL_PAGE_TYPE
  • An index page has a value of 45bf (or 17855 in decimal) – FIL_PAGE_INDEX
  • Checksum is stored from byte/offset 0 to 4 of each page – FIL_PAGE_SPACE_OR_CHKSUM
  • Index ID is stored at byte/offset 28 of each page data – PAGE_INDEX_ID
#define FIL_PAGE_TYPE 24 /*!< file page type: FIL_PAGE_INDEX,..., 
                           2 bytes. */

#define FIL_PAGE_INDEX 17855 /*!< B-tree node */ 

#define FIL_PAGE_SPACE_OR_CHKSUM 0 /*!< in < MySQL-4.0.14 space id the
page belongs to (== 0) but in later
versions the 'new' checksum of the
page */ 

/**************************************************************//**
Gets the index id field of a page.
@return index id */
UNIV_INLINE
index_id_t
btr_page_get_index_id(
/*==================*/
const page_t* page) /*!< in: index page */
{
return(mach_read_from_8(page + PAGE_HEADER + PAGE_INDEX_ID));
} 

#define PAGE_HEADER FSEG_PAGE_DATA /* index page header starts at this

#define FSEG_PAGE_DATA FIL_PAGE_DATA

#define FIL_PAGE_DATA 38U /*!< start of the data on the page */

#define PAGE_INDEX_ID 28 /* index id where the page belongs.
This field should not be written to after
page creation. */

Source code:
storage/innobase/include/fil0fil.h#L507
storage/innobase/include/fil0fil.h#L575 
storage/innobase/include/fil0fil.h#L483
storage/innobase/include/btr0btr.ic#L109
storage/innobase/include/page0page.h#L56
storage/innobase/include/fsp0types.h#L68
storage/innobase/include/fil0fil.h#L560
storage/innobase/include/page0page.h#L82

To extract the above information from an .ibd file I will use a small bash script to read each page from the file and export the information we will need:

#!/bin/bash
FILE=$1

echo "Reading ${FILE}"
PAGE_SIZE=16384
SIZE=$(ls -l ${FILE} | awk '{print $5}')
PAGES=$((SIZE / PAGE_SIZE))
echo "FILE HAS ${PAGES} pages"

for ((page=1; page<=PAGES; page++))
do
  OFFSET=$((page * PAGE_SIZE))
  PAGETYPE_OFFSET=$((OFFSET + 24))
  PAGETYPE=$(xxd -p -l2 -s ${PAGETYPE_OFFSET} ${FILE})
  PAGECHECKSUM_OFFSET=$((OFFSET))
  PAGECHECKSUM=$(xxd -p -l4 -s ${PAGECHECKSUM_OFFSET} ${FILE})
  INDEXID_OFFSET=$((OFFSET + 38 + 28))
  INDEXID=$(xxd -p -l8 -s ${INDEXID_OFFSET} ${FILE})
  echo "${page} at offset ${OFFSET} has type ${PAGETYPE} and checksum ${PAGECHECKSUM} - ID ${INDEXID}"
done

The last piece for the puzzle is to translate the Index ID to our secondary index. To do it we will make usage of the information_schema table:

SELECT i.NAME, HEX(i.INDEX_ID) FROM information_schema.INNODB_SYS_INDEXES as i JOIN information_schema.INNODB_SYS_TABLES t USING (TABLE_ID) WHERE t.NAME = 'DB/TABLE';

Reproducible test case

To reproduce the corruption we will use a test table with one primary key, one secondary index, and 1000 rows:

#create table
mysql -u root -psekret -e "CREATE TABLE `tb1` (   `ID` int(11) NOT NULL,   `b` varchar(100) DEFAULT NULL,   PRIMARY KEY (`ID`),   KEY `b` (`b`) ) ENGINE=InnoDB" test

#populate table
for i in $(seq 1 1000); do mysql -psekret -e "INSERT INTO test.tb1 VALUES (${i}, 'marcelo_$i')"; done

At this point, InnoDB has all the table pages already loaded into InnoDB Buffer Pool. We will force it to read the pages from disk. The most efficient way will be by adding innodb_buffer_pool_load_at_startup=OFF to my.cnf and restart MySQL.

We would now like to check the list of  Index ID’s of test/tb1 table:

mysql -psekret -e "SELECT i.NAME, HEX(i.INDEX_ID) FROM information_schema.INNODB_SYS_INDEXES as i JOIN information_schema.INNODB_SYS_TABLES t USING (TABLE_ID) WHERE t.NAME = 'test/tb1';"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------+-----------------+
| NAME    | HEX(i.INDEX_ID) |
+---------+-----------------+
| PRIMARY | 30              |
| b       | 31              |
+---------+-----------------+

We will be looking for pages which have a type 45bf (FIL_PAGE_INDEX) and Index ID 31. It’s time to execute our bash script to read the .ibd file:

./read.sh /var/lib/mysql/test/tb1.ibd
Reading /var/lib/mysql/test/tb1.ibd
FILE HAS 10 pages
1 at offset 16384 has type 0005 and checksum 50312eaf - ID 0000000000000000
2 at offset 32768 has type 0003 and checksum 563aeb4c - ID ffffffff0000ffff
3 at offset 49152 has type 45bf and checksum 30ad20b7 - ID 0000000000000030
4 at offset 65536 has type 45bf and checksum 96977741 - ID 0000000000000031
5 at offset 81920 has type 45bf and checksum 5aa506c9 - ID 0000000000000030
6 at offset 98304 has type 45bf and checksum 20f3fe3f - ID 0000000000000030
7 at offset 114688 has type 45bf and checksum 081fbb8f - ID 0000000000000030
8 at offset 131072 has type 45bf and checksum dc67f354 - ID 0000000000000031
9 at offset 147456 has type 45bf and checksum 370eb81c - ID 0000000000000031
10 at offset 163840 has type  and checksum  - ID

Let’s get page eight as the page we will be manipulating. It starts at offset 131072 of tb1.ibd, has a page type 45bf, the index ID field is 31, and it has a checksum of dc67f354. To manipulate the page, we will be changing the checksum field to dc67f355.

printf '\xdc\x67\xf3\x55' | dd of=/var/lib/mysql/test/tb1.ibd bs=1 seek=131072 count=4 conv=notrunc
4+0 records in
4+0 records out
4 bytes (4 B) copied, 0.000121983 s, 32.8 kB/s

Now, next time we try to read this particular page, MySQL will identify a corruption because the checksum doesn’t match. We don’t want the server to assert, so we will change the checksum back to its original value right after we attempt to read the table. This way MySQL will only mark this table as corrupted and as missing in the data dictionary:

mysql -psekret -e "SELECT * FROM test.tb1" & sleep 0.1 && printf '\xdc\x67\xf3\x54' | dd of=/var/lib/mysql/test/tb1.ibd bs=1 seek=131072 count=4 conv=notrunc

That is it! We have provoked an InnoDB secondary index corruption and now all subsequent queries will fail. Now you can make sure your monitoring tool / QA process covers this type of situations.

If in your version of MySQL it has only marked the table as corrupted, a simple CHECK TABLE will bring it back to a usable state. If your version of MySQL is reporting the table as missing from the data dictionary, you can just restart MySQL to force it to re-read the table pages again, since we have changed the checksum back to its original value, it won’t report any error at this time.

Never try to change any MySQL file manually on a production environment. This can cause unrecoverable corruption and data loss.

Jun
27
2019
--

InnoDB ALTER TABLE ADD INDEX and INSERT performance

InnoDB ALTER TABLE ADD INDEX

InnoDB ALTER TABLE ADD INDEXIn my previous blog post, I explained the internals of the sorted index build process. The blog ended with saying “there is one disadvantage.”

Beginning in MySQL 5.6, many DDLs including ALTER TABLE ADD INDEX became “ONLINE”. Meaning, when the ALTER is in progress, there can be concurrent SELECTS and DMLs. See the MySQL documentation for online DDL. From the documentation, we can see that ALTER TABLE ADD INDEX DDL permits concurrent DML.

The main disadvantage with Sorted Index Builds introduced in 5.7 is the reduced insert performance when ALTER is in progress. In this blog, we specifically talk about single thread insert performance on a table with ALTER ADD INDEX in progress.

If the table is huge, let’s say around 600 million rows or more, the inserts can even crash the server. This is especially true for ALTERs that run for hours and the concurrent insert waits for more than 600 seconds. InnoDB’s monitor thread crashes the server, stating that the INSERT waited for latch more than 600 seconds. It is reported as MySQL Bug#82940

Is it fixed?

The problem has existed since 5.7 GA and it is fixed in the latest release of Percona Server for MySQL 5.7.26-29  and 8.0.15-6 as part of PS-3410 bug fix. The number of inserts completed depends on whether the table is compressed or uncompressed, as well as the page size.

Percona’s fix is provided to upstream (Oracle MySQL) https://github.com/mysql/mysql-server/pull/268 but has yet to be included. We hope Oracle will include the fix in their next 5.7 release for the benefit of the MySQL community.

If you cannot upgrade to PS-5.7.26, one reasonable workaround would be to use pt-online-schema-change. With this tool, ensure that you have disk space at least equal to the original tablespace size.

How much is the improvement?

The % of improvement depends on the test scenario, machine configuration, etc. See details below.

For uncompressed tables, compared to 5.7.25 (baseline), with the fix version (5.7.26), 58% more inserts finished when ALTER ADD INDEX is running.

For compressed tables, 322% more inserts finished when ALTER ADD INDEX is running.

How does it compare to 5.6?

After the fix, for uncompressed tables, the number of inserts (from a single connection) completed during ALTER ADD INDEX is the same as 5.6.

For compressed tables, the number of inserts completed with 5.6.44 is 43% more than 5.7.26 (which has a fix). This is a bit surprising, and more analysis has to be done to find the reason. A topic for another day.

The problem from a design perspective

As part of sorted index builds, on the index being built, index->lock is acquired in X (exclusive) mode. This lock is held for the entire duration of the sorted index build. Yes, you read it right, for the entire duration. See PS-3410 for complete details.

Concurrent inserts will be able to see that there is a ‘new index’ being built. For such indexes, inserts go into Online ALTER logs, which are later executed at the end of ALTER. As part of this, INSERT tries to acquire index->lock in S (shared) mode to see if the index is in an online or aborted state.

Since the sorted index build process holds index->lock in X mode for the entire duration, concurrent insert waits for this latch. If the index is huge, the wait by insert thread crosses 600 seconds, and it will crash the server.

Fix

The fix is rather simple. Sorted index builds do not need to acquire index->lock in X mode. At this stage, there are no concurrent reads on this uncommitted index. Concurrent inserts do not interfere with the sorted index build. They go to online ALTER logs. So it is safe to not acquire the index->lock of the index being built.

Test Case

The following MTR test case is written to show the number of inserts that are concurrently executed while ALTER is running. Note that there is only one connection that does the inserts.

The test is run with innodb_buffer_pool_size = 1G for all versions. Two versions of the table are used. One with regular 16K page size and the other a compressed table with 4K page size.

The data directory is stored in RAM for all tests. You can save the below file (for example as mysql-test/t/alter_insert_concurrency.test) and run the MTR test case as:

./mtr --mem main.alter_insert_concurrency --mysqld=--innodb_buffer_pool_size=1073741824

The test inserts 10 million rows to the table and does CREATE INDEX (same as ALTER TABLE t1 ADD INDEX) and in another connection, INSERTS are executed one by one until ALTER finishes.

--source include/have_innodb.inc
--source include/count_sessions.inc
connect (con1,localhost,root,,);
CREATE TABLE t1(
class INT,
id INT,
title VARCHAR(100),
title2 VARCHAR(100)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
DELIMITER |;
CREATE PROCEDURE populate_t1()
BEGIN
DECLARE i int DEFAULT 1;
START TRANSACTION;
WHILE (i <= 1000000) DO
INSERT INTO t1 VALUES (i, i, uuid(), uuid());
SET i = i + 1;
END WHILE;
COMMIT;
END|
CREATE PROCEDURE conc_insert_t1()
BEGIN
DECLARE i int DEFAULT 1;
SELECT COUNT(*) INTO @val FROM INFORMATION_SCHEMA.PROCESSLIST WHERE ID != CONNECTION_ID() AND info LIKE "CREATE INDEX%";
IF @val > 0 THEN
SELECT "ALTER STARTED";
END IF;
WHILE (@val > 0) DO
INSERT INTO t1 VALUES (i, i, uuid(), uuid());
SET i = i + 1;
SELECT COUNT(*) INTO @val FROM INFORMATION_SCHEMA.PROCESSLIST WHERE ID != CONNECTION_ID() AND info LIKE "CREATE INDEX%";
END WHILE;
SELECT concat('Total number of inserts is ', i);
END|
DELIMITER ;|
--disable_query_log
CALL populate_t1();
--enable_query_log
--connection con1
--send CREATE INDEX idx_title ON t1(title, title2);
--connection default
--sleep 1
--send CALL conc_insert_t1();
--connection con1
--reap
--connection default
--reap
--disconnect con1
DROP TABLE t1;
DROP PROCEDURE populate_t1;
DROP PROCEDURE conc_insert_t1;
--source include/wait_until_count_sessions.inc

Numbers

compressed 4k           : number of concurrent inserts (Avg of 6 runs)
==============            ============================
PS 5.7.25 (and earlier) : 2315
PS 5.7.26 (fix version) : 9785.66 (322% improvement compared to 5.7.25) (43% worse compared to 5.6)
PS 5.6                  : 17341
16K page size
=============
PS 5.7.25 (and earlier) : 3007
PS 5.7.26 (fix version) : 4768.33 (58.5% improvement compared to 5.7.25) (3.4% worse compared to 5.6)
PS 5.6                  : 4939

May
08
2019
--

MySQL InnoDB Sorted Index Builds

Bulk load of 7.7

It’s not essential to understand how MySQL® and Percona Server for MySQL build indexes. However, if you have an understanding of the processing, it could help when you want to reserve an appropriate amount of space for data inserts. From MySQL 5.7, developers changed the way they built secondary indexes for InnoDB, applying a bottom-up rather than the top-down approach used in earlier releases. In this post, I’ll walk through an example to show how an InnoDB index is built. At the end, I’ll explain how you can use this understanding to set an appropriate value for innodb_fill_factor.

Index building process

To build an index on a table with existing data, there are the following phases in InnoDB

  1. Read phase (read from clustered index and build secondary index entries)
  2. Merge sort phase
  3. Insert phase (insert sorted records into the secondary index)

Until version 5.6, MySQL built the secondary index by inserting one record at a time. This is a “top-down” approach. The search for the insert position starts from root (top) and reaches the appropriate leaf page (down). The record is inserted on leaf page pointed to by the cursor. It is expensive in terms of finding insert position and doing page splits and merges (at both root and non-root levels). How do you know that there are too many page splits and merges happening? You can read about that in an earlier blog by my colleague Marco Tusa, here.

From MySQL 5.7, the insert phase during add index uses “Sort Index Build”, also known as “Bulk Load for Index”. In this approach,  the index is built “bottom-up”. i.e.  Leaf pages (bottom) are built first and then the non-leaf levels up to root (up).

Use cases

A sorted index build is used in these cases:

  • ALTER TABLE t1 ADD INDEX (or CREATE INDEX)
  • ALTER TABLE t1 ADD FULLTEXT INDEX
  • ALTER TABLE t1 ADD COLUMN, ALGORITHM=INPLACE
  • OPTIMIZE TABLE t1

For the last two use cases, ALTER creates a intermediate table. The intermediate table indexes (both primary and secondary) are built using “sorted index build”.

Algorithm

  1. Create a page at level 0. Also create a cursor to this page.
  2. Insert into the page using the cursor at Level 0 until is full.
  3. Once the page is full, create a sibling page (do not insert into sibling page yet).
  4. Create a node pointer (minimum key in child page, child page number) for the current full page and insert a node pointer into one level above (parent page).
  5. At upper level, check if cursor is already positioned. If not, create a parent page and a cursor for the level.
  6. Insert a node pointer at the parent page
  7. If parent page is full, repeat steps 3, 4, 5, 6
  8. Now insert into the sibling page and make the cursor point to the sibling page.
  9. At the end of all inserts,  there is cursor at each level pointing to the rightmost page. Commit all the cursors (meaning commit the mini-transaction that modified the  pages, release all the latches).

For the sake of simplicity, the above algorithm skipped the details about compressed pages and the handling of BLOBs (externally stored BLOBs).

Walk through of building an index, bottom-up

Using an example, let’s see how a secondary index is built, bottom-up. Again for simplicity’s sake, assume the maximum number of records allowed in leaf and non leaf pages is three.

CREATE TABLE t1 (a INT PRIMARY KEY, b INT, c BLOB);
INSERT INTO t1 VALUES (1, 11, 'hello111');
INSERT INTO t1 VALUES (2, 22, 'hello222');
INSERT INTO t1 VALUES (3, 33, 'hello333');
INSERT INTO t1 VALUES (4, 44, 'hello444');
INSERT INTO t1 VALUES (5, 55, 'hello555');
INSERT INTO t1 VALUES (6, 66, 'hello666');
INSERT INTO t1 VALUES (7, 77, 'hello777');
INSERT INTO t1 VALUES (8, 88, 'hello888');
INSERT INTO t1 VALUES (9, 99, 'hello999');
INSERT INTO t1 VALUES (10, 1010, 'hello101010');
ALTER TABLE t1 ADD INDEX k1(b);

InnoDB appends the primary key fields to the secondary index. The records of secondary index k1 are of format (b, a).  After the sort phase, the records are

(11,1), (22,2), (33,3), (44,4), (55,5), (66,6), (77,7), (88,8), (99,9), (1010, 10)

Initial insert phase

Let’s start with record (11,1).

  1. Create a page at Level 0 (leaf level).
  2. Create a cursor to the page.
  3. All inserts go to this page until it is full.

The arrow shows where the cursor is currently pointing. It is currently at page number 5, and the next inserts go to this page.

There are two more free slots, so insertion of the records (22,2) and (33,3) is straightforward.

For the next record (44, 4), page number 5 is full. Here are the steps

Index building as pages become filled

  1. Create a sibling page – page number 6
  2. Do not insert into the sibling page yet.
  3. Commit the page at the cursor i.e. mini transaction commit, release latch etc
  4. As part of the commit, create a node pointer and insert it into the parent page at [current Level + 1]. i.e at Level 1
  5. The node pointer is of the format (minimum key in child page, child page number). Minimum key in page number 5 is (11,1). Insert record ((11,1),5) at the parent level.
  6. The parent page at Level 1 doesn’t exist yet. MySQL creates page number 7 and a cursor pointing to page number 7.
  7. Insert ((11,1),5) into page number 7
  8. Now, return back to Level 0 and create links from page number 5 to 6 and vice versa
  9. The cursor at Level 0 now points to sibling page number 6.
  10. Insert (44,4) into page number 6

 

The next inserts – of (55,5) and (66,6) – are straightforward and they go to page 6.

Insertion of record (77,7) is similar to (44,4) except that the parent page (page number 7) already exists and it has space for two more records. Insert node pointer ((44,4),8) into page 7 first, and then record (77,7) into sibling page 8.

Insertion of records (88,8) and (99,9) is straightforward because page 8 has two free slots.

Next insert (1010, 10). Insert node pointer ((77,7),8) to the parent page (page number 7) at Level 1.

MySQL creates sibling page number 9 at Level 0. Insert record (1010,10) into page 9 and change the cursor to this page.

Commit the cursors at all levels. In the above example, the database commits page 9 at Level 0 and page 7 at Level 1. We now have a complete B+-tree index that is built from bottom-up!

Index fill factor

The global variable “innodb_fill_factor” sets the amount of space in a Btree page to be used for inserts. The default value is 100, which means the entire page is used (exclude page headers, trailers). A clustered index has an exemption with innodb_fill_factor = 100. In this case, 1/16th of clustered index page space is kept free. ie. 6.25% space is reserved for future DMLs.

A value of 80 means that MySQL uses 80% of the page for inserts, and leaves 20% for future updates.

If innodb_fill_factor is 100, there is no free space left for future inserts into the secondary index.  If you expect more DMLs on the table after add index, this can lead to page splits and merges again. In such cases, it is advisable to use values between 80-90. This variable value also affects the indexes recreated with OPTIMIZE TABLE or ALTER TABLE DROP COLUMN, ALGORITHM=INPLACE.

You should not use values that are too low – for example below 50 – because the index then occupies significantly more disk space. With low values there are more pages in an index, and index statistics sampling might not be optimal. The optimizer could choose wrong query plans with sub-optimal statistics.

Advantages of Sorted Index Build

  1. No page splits (excluding compressed tables) and merges.
  2. No repeated searches for insert position.
  3. Inserts are not redo logged (except for page allocations), so there is less pressure on the redo log subsystem.

Disadvantages:

None… Well, OK, there is one and it deserves a separate blog post ?  Stay tuned!

Apr
11
2019
--

MySQL 8.0 Architecture and Enhancement Webinar: Q & A

MySQL 8.0 Architecture and Enhancement

MySQL 8.0 Architecture and EnhancementIn this blog, I will provide answers to the Q & A for the MySQL 8.0 Architecture and Enhancement webinar.

First, I want to thank everybody for attending my April 9, 2019, webinar. The recording and slides are available here. Below is the list of your questions that I was unable to answer fully during the webinar.

Q: What kind of Encryption levels are provided into MySQL 8.0?

The MySQL data-at-rest encryption feature supports the Advanced Encryption Standard (AES) block-based encryption algorithm.

In MySQL block_encryption_mode variable controls the block encryption mode. The default setting is aes-128-ecb. Set this option to aes-256-cbc, for example, under the [mysqld] option group in the MySQL configuration file (/etc/my.cnf):

[mysqld]
block_encryption_mode=aes-256-cbc

Q: At what frequency does the redo log buffer flush the changes to disk? When is the commit variable set to zero?

Here’s an overview:

  • innodb_flush_log_at_trx_commit variable can be configured to set flush frequency in MySQL 5.7 and 8.0 it’s set to 1 by default.
  • innodb_flush_log_at_trx_commit =0  logs are written and flushed to disk once per second. Transactions for which logs have not been flushed can be lost in a crash.
  • innodb_flush_log_at_trx_commit =1 is required for full ACID compliance. Logs are written and flushed to disk at each transaction commit. Default Setting.
  • innodb_flush_log_at_trx_commit =2 logs are written after each transaction commit and flushed to disk once per second. Transactions for which logs have not been flushed can be lost in a crash.

Q: How are persistent variables reset?

Using the RESET PERSIST command we can remove persisted global system variable settings from the mysqld-auto.cnf.

Example:

mysql > SET PERSIST binlog_encryption=ON;
Query OK, 0 rows affected (0.00 sec)
$ cat data/mysqld-auto.cnf
{ "Version" : 1 , "mysql_server" : { "mysql_server_static_options" : { "binlog_encryption" : { "Value" : "ON" , "Metadata" : { "Timestamp" : 1554896858076255 , "User" : "msandbox" , "Host" : "localhost" } } } } }
MySQL > RESET PERSIST binlog_encryption;
Query OK, 0 rows affected (0.00 sec)
$ cat data/mysqld-auto.cnf
{ "Version" : 1 , "mysql_server" : { } }

To reset all persistent variables use following command.

mysql > RESET PERSIST;
Query OK, 0 rows affected (0.00 sec)

Q: Does pt-config-diff work with these persistent vs. my.cnf variable settings?

No, it will not work. Due to config format differences in these files.

$ pt-config-diff ./my.sandbox.cnf data/mysqld-auto.cnf
Cannot auto-detect the MySQL config format at /home/lalit/perl5/bin/pt-config-diff line 3010.

Q: Regarding the UNDO Tablespace, do we have any specific retention to follow?

This is not required because in MySQL 8.0, the innodb_undo_log_truncate variable is enabled by default. It will perform an automatic truncate operation on the UNDO tablespace. When undo tablespaces exceed the threshold value defined by innodb_max_undo_log_size (default value is 1024 MiB) they are marked for truncation.

Truncating the undo tablespace performs the following action:

  1. Performs deactivation of undo tablespace
  2. Truncation of undo tablespace
  3. Reactivation of undo tablespaces

NOTE:  Truncating undo logs that reside in the system tablespace is not supported. 

Q: Corrupted data on disk where the secondary indexes don’t match to the primary?

I’ll be happy to respond to this, but I’ll need a little bit more information… feel free to add more detail to the comments section and I’ll see if I can provide some insight.

Thanks for attending this webinar on MySQL 8.0 Architecture and Enhancement Webinar . You can find the slides and a recording here.

 

Mar
04
2019
--

Percona XtraBackup 8.0.5 Is Now Available

Percona XtraBackup 8.0

Percona XtraBackup 8.0

Percona is glad to announce the release of Percona XtraBackup 8.0.5 on March 4, 2019. Downloads are available from our download site and from apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, it drives down backup costs while providing unique features for MySQL backups.

Percona XtraBackup 8.0.5 introduces the support of undo tablespaces created using the new syntax (CREATE UNDO TABLESPACEavailable since MySQL 8.0.14. Percona XtraBackup also supports the binary log encryption introduced in MySQL 8.0.14.

Two new options were added to xbstream. Use the --decompress option with xbstream to decompress individual qpress files. With the --decompress-threads option, specify the number of threads to apply when decompressing. Thanks to Rauli Ikonen for this contribution.

This release of Percona XtraBackup is a General Availability release ready for use in a production environment.

All Percona software is open-source and free.

Please note the following about this release:

  • The deprecated innobackupex has been removed. Use the xtrabackup command to back up your instances: $ xtrabackup --backup --target-dir=/data/backup
  • When migrating from earlier database server versions, backup and restore and using Percona XtraBackup 2.4 and then use mysql_upgrade from MySQL 8.0.x
  • If using yum or apt repositories to install Percona XtraBackup 8.0.5, ensure that you have enabled the new tools repository. You can do this with the percona-release enable tools release command and then install the percona-xtrabackup-80 package.

New Features

  • PXB-1548: Percona XtraBackup enables updating the ib_buffer_pool file with the latest pages present in the buffer pool using the --dump-innodb-buffer-pool option. Thanks to Marcelo Altmann for contribution.
  • PXB-1768: Added support for undo tablespaces created with the new MySQL 8.0.14 syntax.
  • PXB-1781: Added support for binary log encryption introduced in MySQL 8.0.14.
  • PXB-1797: For xbstream, two new options were added. The --decompress option enables xbstream to decompress individual qpress files. The --decompress-threads option controls the number of threads to apply when decompressing. Thanks to Rauli Ikonen for this contribution.

Bugs Fixed

  • Using --lock-ddl-per-table caused the server to scan all records of partitioned tables which could lead to the “out of memory” error. Bugs fixed PXB-1691 and PXB-1698.
  • When Percona XtraBackup was started run with the --slave-info, incorrect coordinates were written to the xtrabackup_slave_info file. Bug fixed PXB-1737
  • Percona XtraBackup could crash at the prepare stage when making an incremental backup if the variable innodb-rollback-segments was changed after starting the MySQL Server. Bug fixed PXB-1785.
  • The full backup could fail when Percona Server was started with the --innodb-encrypt-tables parameter. Bug fixed PXB-1793.

Other bugs fixed: PXB-1632PXB-1715PXB-1770PXB-1771PXB-1773.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com