Oct
28
2020
--

Say Hello to Libcoredumper – A New Way to Generate Core Dumps, and Other Improvements

Libcoredumper

LibcoredumperIn a perfect world, we expect all software to run flawlessly and never have problems such as bugs and crashes. We also know that this perfect world doesn’t exist and we better be as prepared as possible to troubleshoot those types of situations. Historically, generating core dumps has been a task delegated to the kernel. If you are curious about how to enable it via Linux kernel, you can check out Getting MySQL Core file on Linux. There are a few drawbacks that pose either a limitation or a huge strain to get it working, such as:

  • System-wide configuration required. This is not something DBA always has access to.
  • Inability or very difficult to enable it for a specific binary only. Standards ways enable it for every software running on the box.
  • Nowadays, with cloud and containers, this task has become even more difficult because it sometimes requires containers to be running on privileged mode and host OS to be properly configured by the provider.

The above issues have driven exploration of alternative ways to do create a core dump to help troubleshooting bugs and crashes. More details can be found at PS-7114 .

The Libcoredumper

The libcoredumper is a fork of the archived project google-coredumper. Percona has forked it under Percona-Lab Coredumper, cherry-picked a few improvements from other forks of the same project, and enhanced it to compile and work on newer versions of Linux as well on newer versions of GCC and CLANG.

This project is a Tech Preview, as you may infer from the repository name (Percona Lab). We might not maintain compatibility with future kernel versions and/or patches. One should test the core dumper on their environment before putting this tool into production. We have tested on kernel versions up to 5.4.

This functionality is present on all versions of Percona Server for MySQL and Percona XtraDB Cluster starting from 5.7.31 and 8.0.21. If you compile PS/PXC from source, you can control if the library will be compiled by switching -DWITHCOREDUMPER to ON/OFF (default is ON).

How To Configure It

A new variable named coredumper has been introduced. One should include it under the [mysqld] section of my.cnf and it works independently of the older configuration core-file. This new variable can either be a boolean (no value specified) or with value. It follows a few rules:

  • No value – core dump will have saved under MySQL datadir and will be named core.
  • A path ending with  /  – core dump will be saved under the specified directory and will be named core.
  • A full path with filename  – core dump will be saved under the specified directory and will use the specified name.

Every core file will end with the timestamp of the crash instead of PID, for two main reasons:

  • Make it easier to correlate a core dump with a crash, as MySQL always print a Zulu/UTC timestamp on the logs when it crashes:
    10:02:09 UTC - mysqld got signal 11 ;
  • Operators / Containers will always be running MySQL (or whatever application it is running) as PID 1. If MySQL has crashed multiple times, we don’t want to core-dump to get overwritten by the last crash.

How To Know If I Am Using libcoredumper

When MySQL attempts to write a core file it stamps the log saying it will write a core file. When it does it delegating the action to Linux kernel, you always see a message like below:

. . .
Writing a core file

The above behavior remains the same, however, when MySQL is using libcoredumper to generate the core file, one should see that message informing that the library will be responsible for the action:

. . .
Writing a core file using lib coredumper

Other Improvements

Apart from libcoredumper, starting from the same 5.7 and 8.0 releases a stack trace will also:

  • Print binary BuildID – This information is very useful for support/development people in case the MySQL binary that crashed is a stripped binary. Stripped binaries are a technique to remove part of the binaries that are not essential for it to run, making the binary occupy less space in disk and in memory. When computers had a restriction on memory, this technique was widely used. Nowadays this doesn’t pose a limitation anymore on most of the hardware, however, it is becoming popular once again with containers where image size matters. Stripping the binary removed the binary symbols table, which is required to resolve a stack trace and lets you read the core dump. BuildID is how we can link things together again.
  • Print the server Version – This information is also useful to have at glance. Recent versions of MySQL/Percona Server for MySQL have a fix for many know issues. Having this information helps to establish the starting point investigation. MySQL only prints the server version when it starts, and by the moment a server crashes, its log may have grown significantly or even got rotated/truncated.

Here is one example of how a crash with stack trace will look like:

14:23:52 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

Build ID: 55b4b87f230554777d28c6715557ee9538d80115
Server Version: 8.0.21-12-debug Source distribution

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x46000
/usr/local/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x55) [0x55943894c280]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x2e0) [0x559437790768]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13f40) [0x7f9d413bcf40]
/lib/x86_64-linux-gnu/libc.so.6(__poll+0x49) [0x7f9d40858729]
/usr/local/mysql/bin/mysqld(Mysqld_socket_listener::listen_for_connection_event()+0x64) [0x55943777db6a]
/usr/local/mysql/bin/mysqld(Connection_acceptor<Mysqld_socket_listener>::connection_event_loop()+0x30) [0x55943737266e]
/usr/local/mysql/bin/mysqld(mysqld_main(int, char**)+0x30c6) [0x559437365de1]
/usr/local/mysql/bin/mysqld(main+0x20) [0x559437114005]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7f9d4076db6b]
/usr/local/mysql/bin/mysqld(_start+0x2a) [0x559437113f2a]
Please help us make Percona Server better by reporting any
bugs at https://bugs.percona.com/

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
Writing a core file using lib coredumper

TL;DR

Libcoredumper serves as an alternative for current –core-file functionality for generating memory dumps. In case of any crash of MySQL, a core dump is written and can be later processed /read via GDB to understand the circumstances of such a crash.
Users can enable it by adding the below variable to [mysqld] section of my.cnf:

[mysqld]
coredumper

Percona Server for MySQL versions starting from 5.7.31 and 8.0.31 include the library by default. Refer to below documentation pages for more details:

https://www.percona.com/doc/percona-server/5.7/diagnostics/libcoredumper.html

https://www.percona.com/doc/percona-server/5.7/diagnostics/stacktrace.html

Summary

If you faced any issue or limitation on enabling core dumps before feel free to test new versions of Percona Server for MySQL/Percona XtraDB Cluster and use libcoredumper. Also, any feedback is very welcome on how we can improve the troubleshooting of bugs/crashes even further.

Oct
27
2020
--

Using MySQL 8 Persisted System Variables

MySQL 8 Persisted System Variables

MySQL 8 Persisted System VariablesThis blog discusses new features about the persisted system variables and how we can use it to make variable changes persistent. The MySQL server maintains system variables that control its operations. The dynamic variables used prior to the MySQL 8 release are not persistent and are reset upon restart. These variables can be changed at runtime using the SET statement to affect the operation of the current server instance but we have to manually update my.cnf config file to make them persistent. In many cases, updating my.cnf from the server-side is not a convenient option, and leaving the variable just updated dynamically reverts on the subsequent restart without any history.

Persisted system variables are one of the useful features introduced in MySQL 8. The new functionality helps DBAs update the variables dynamically and register them without touching the configuration files from the server-side.

How to Persist the Global System Variables?

Like SET GLOBAL, SET PERSIST is the command that can be used for updating the system variables at runtime and make them persistent across restarts. When we use the PERSIST keyword, the variable changes are updated to the mysqld-auto.cnf option file in the data directory. The mysqld-auto.cnf is a JSON format file created only upon the first execution of the PERSIST or PERSIST_ONLY statement.

Let’s see how this feature works using an example updating the max connections.

mysql> SET PERSIST max_connections = 1000;
Query OK, 0 rows affected (0.00 sec)
mysql> select @@max_connections\G
*************************** 1. row ***************************
@@max_connections: 1000
1 row in set (0.00 sec)

The resulting mysqld-auto.cnf looks like this:

cat /var/lib/mysql/mysqld-auto.cnf
{ "Version" : 1 , "mysql_server" : { "max_connections" : { "Value" : "1000" , "Metadata" : { "Timestamp" : 1602543199335909 , "User" : "root" , "Host" : "localhost" } } } }

How to Persist the Read-Only System Variables?

When there is a need to change variables that are read-only, we need to use the PERSIST_ONLY keyword. This clause updates the change in the mysqld-auto.cnf file but does not apply in the MySQL, persisting upon the next MySQL restart. This makes PERSIST_ONLY suitable for configuring read-only system variables that can be set only at server startup.

mysql>  SET PERSIST innodb_log_file_size=50331648*2;
ERROR 1238 (HY000): Variable 'innodb_log_file_size' is a read-only variable
mysql> set persist_only innodb_log_file_size=50331648*2;
Query OK, 0 rows affected (0.01 sec)

How to Clear the Persisted System Variable Settings?

We can use the RESET PERSIST command to remove persisted settings from mysqld-auto.cnf. Be careful when running the command without a specific variable name as it will remove all the settings from the configuration file. In effect, it removes the persisted settings from mysqld-auto.cnf but not from MySQL.

See the below examples.

mysql> RESET PERSIST;
Query OK, 0 rows affected (0.00 sec)
cat /var/lib/mysql/mysqld-auto.cnf
{ "Version" : 1 , "mysql_server" : {  } }

mysql> select @@max_connections;
+-------------------+
| @@max_connections |
+-------------------+
|              1000 |
+-------------------+
1 row in set (0.01 sec)

If you may want to clear a specific variable instead of clearing all settings from the configuration file, the following example shows us how to do this. If we try to remove a variable that does not exist in the mysqld-auto.cnf it results in an error as shown below and we can use the IF EXISTS clause to suppress the error.

mysql> RESET PERSIST max_connections;
Query OK, 0 rows affected (0.00 sec)
mysql> RESET PERSIST innodb_max_dirty_pages_pct;
ERROR 3615 (HY000): Variable innodb_max_dirty_pages_pct does not exist in persisted config file
mysql>
mysql> RESET PERSIST IF EXISTS innodb_max_dirty_pages_pct;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
| Warning | 3615 | Variable innodb_max_dirty_pages_pct does not exist in persisted config file |

Is There a Way to Disable Persistence?

Yes, the persisted_globals_load parameter is used to enable or disable the persisted system variables. When disabled, the server startup sequence ignores the mysqld-auto.cnf file. Manually changing to the mysqld-auto.cnf file may result in a parse error at server startup. In this case, the server reports an error and exits. If this issue occurs, then you have to start the server with the persisted_globals_load system variable disabled or with the –no-defaults option as given in the example below.

mysql> select @@persisted_globals_load ;
+-------------------------------------+
| @@persisted_globals_load            |
+-------------------------------------+
|                 1                   |
+-------------------------------------+
1 row in set (0.00 sec)

grep -i persisted_globals_load /etc/my.cnf
persisted_globals_load=0

mysql> restart;
Query OK, 0 rows affected (0.00 sec)

mysql>  select @@persisted_globals_load ;
+-----------------------------------+
|    @@persisted_globals_load       |
+-----------------------------------+
|                 0                 |
+----------------------------------+
1 row in set (0.00 sec)

mysql> select @@max_connections;
+-------------------+
| @@max_connections |
+-------------------+
|               500 |
+-------------------+
1 row in set (0.00 sec)

What Grants Are Required?

Considering security, the right permissions for the right user is definitely a best practice. The SYSTEM_VARIABLES_ADMIN and PERSIST_RO_VARIABLES_ADMIN are the required privileges for a user to use the SET PERSIST_ONLY to persist global system variables to the mysqld-auto.cnf.

The user also needs to have the SHUTDOWN privilege to be able to use the RESTART command. It provides a way to restart MySQL from the client session without requiring command-line access on the server host.

mysql> CREATE USER 'admin_persist'@'localhost' IDENTIFIED BY '*********';
Query OK, 0 rows affected (0.02 sec)

mysql> GRANT SYSTEM_VARIABLES_ADMIN, PERSIST_RO_VARIABLES_ADMIN, SHUTDOWN on *.* to 'admin_persist'@'localhost';
Query OK, 0 rows affected (0.03 sec)

How to Monitor the Variables

To list the variables that have been updated using the PERSIST option, we can query the performance_schema.persisted_variables table joining with a couple of other tables as shown below. This is a simple example of how to monitor the variables from the MySQL side and based on your needs, you can modify the query.

mysql> select v.VARIABLE_NAME,g.VARIABLE_VALUE current_value,p.VARIABLE_VALUE as persist_value,SET_TIME,SET_USER,VARIABLE_SOURCE,VARIABLE_PATH from performance_schema.variables_info v JOIN performance_schema.persisted_variables p USING(VARIABLE_NAME) JOIN performance_schema.global_variables g USING(VARIABLE_NAME)\G
*************************** 1. row ***************************
  VARIABLE_NAME: innodb_log_file_size
  current_value: 50331648
  persist_value: 100663296
       SET_TIME: 2020-10-12 18:54:35.725177
       SET_USER: arun
VARIABLE_SOURCE: COMPILED
  VARIABLE_PATH:

*************************** 2. row ***************************
  VARIABLE_NAME: max_connections
  current_value: 1000
  persist_value: 1000
       SET_TIME: 2020-10-12 18:53:19.336115
       SET_USER: root
VARIABLE_SOURCE: DYNAMIC
  VARIABLE_PATH:
2 rows in set (0.06 sec)

mysql> restart;
Query OK, 0 rows affected (0.01 sec)

select v.VARIABLE_NAME,g.VARIABLE_VALUE current_value,p.VARIABLE_VALUE as persist_value,SET_TIME,SET_USER,VARIABLE_SOURCE,VARIABLE_PATH from performance_schema.variables_info v JOIN performance_schema.persisted_variables p USING(VARIABLE_NAME) JOIN performance_schema.global_variables g USING(VARIABLE_NAME)\G
*************************** 1. row ***************************
  VARIABLE_NAME: innodb_log_file_size
  current_value: 100663296
  persist_value: 100663296
       SET_TIME: 2020-10-12 18:54:35.725177
       SET_USER: arun
VARIABLE_SOURCE: PERSISTED
  VARIABLE_PATH: /var/lib/mysql/mysqld-auto.cnf

*************************** 2. row ***************************
  VARIABLE_NAME: max_connections
  current_value: 1000
  persist_value: 1000
       SET_TIME: 2020-10-12 18:53:19.335909
       SET_USER: root
VARIABLE_SOURCE: PERSISTED
  VARIABLE_PATH: /var/lib/mysql/mysqld-auto.cnf
2 rows in set (0.16 sec)

Hopefully, this blog helped to explain what “Persisted System Variables” is and how it operates.

Oct
27
2020
--

A First Glance at Amazon Aurora Serverless RDS

Amazon Aurora Serverless

Amazon Aurora ServerlessIf you often deploy services in the cloud, you certainly, at least once, forgot to stop a test instance. I am like you and I forgot my share of these. Another mistake I do once in a while is to provision a bigger instance than needed, just in case, and forget to downsize it. While this is true for compute instances, it is especially true for database instances. Over time, this situation ends up adding a cost premium. In this post, we’ll discuss a solution to mitigate these extra costs, the use of the RDS Aurora Serverless service.

What is Amazon Aurora Serverless?

Since last spring, Amazon unveiled a new database related product: RDS Aurora Serverless. The aim of this new product is to simplify the management around Aurora clusters. It brings a likely benefit for the end users, better control over cost. Here are some of the benefits we can expect from this product:

  • Automatic scaling up
  • Automatic scaling down
  • Automatic shutdown after a period of inactivity
  • Automatic startup

The database is constantly monitored and if the load grows beyond a given threshold, a bigger Aurora instance is added to the cluster, the connections are moved and the old instance is dropped. The opposite steps happen when a low load is detected. Also, if the database is completely inactive for some time, it is automatically stopped and restarted when needed. The RDS Aurora Serverless cluster type is available for MySQL (5.6 and 5.7) and PostgreSQL (10.12).

Architecture

The RDS Aurora Serverless architecture is similar to the regular RDS Aurora one.  There are three main components; a proxy layer handling the endpoints, the servers processing the queries, and the storage.  The proxy layer and the storage are about the same. As the name implies, what is dynamic with the Aurora Serverless type are the servers.

There are not many details available as to how things are actually implemented but likely but the proxy layer is able to transfer a connection from one server to another when there is a scale up or down event. Essentially, we can assume that when the cluster is modified, the steps are the following:

  1. A new Aurora server instance is created with the new size
  2. The new instance is added to the Aurora cluster
  3. The writer role is transferred to the new instance
  4. The existing connections are moved
  5. The old instance is removed

 

How To Configure It

The configuration of an RDS Aurora Serverless cluster is very similar to a regular Aurora cluster, there are just a few additional steps.  First, of course, you need to choose the serverless type:

RDS Aurora Serverless cluster

And then you have to specify the limits of your cluster in “Capacity”. The capacity unit is ACU which stands for Aurora Capacity Unit. I couldn’t find the exact meaning for the ACU, the documentation has: “Each ACU is a combination of processing and memory capacity.”. An ACU seems to provide about 2GB of RAM and the range of possible values is 1 to 256. You set the minimum and maximum ACU you want for the cluster in the following dialog box:

Aurora Capacity Unit

The last step is to specify the inactivity timeout after which the database is paused:

specify the inactivity timeout

How It Works

Startup

If the Aurora Serverless cluster has no running server instances, an attempt to connect to the database will trigger the creation of a new instance.  This process takes some time.  I used a simple script to measure the connection time after an inactivity timeout and found the following statistics:

Min = 31s
Max = 54s
average = 42s
StdDev = 7.1s
Count = 17

You’ll need to make sure the application is aware of a new connection, as the database can take close to a minute to complete.  I got caught a few times with sysbench timing out after 30s. It is important to remember the initial capacity used is the same as the one when the Aurora Serverless instance stopped, unless you enabled the “Force scaling the capacity…” parameter in the configuration.

Pause

If an Aurora Serverless cluster is idle for more than its defined inactivity time, it will be automatically paused.  The inactivity here is defined in terms of active connections, not queries. An idle connection doing nothing will prevent the Aurora Serverless instance from stopping. If you intend to use the automatic pause feature, I recommend setting the “wait_timeout” and “interactive_timeout” to values in line with the cluster inactivity time.

Scale Up

A process monitors the Aurora Serverless instance and if it sees a performance issue that could be solved by the use of a larger instance type, it triggers a scale up event.  When there is an ongoing scale up (or down) event, you’ll see a process like this one in the MySQL process list:

call action start_seamless_scaling('AQEAAEgBh893wRScvsaFbDguqAqinNK7...

Bear in mind a scale up event can take some time, especially if the server is very busy. While doing some benchmarks, I witness more than 200s on a few occasions. The queries load is affected for a few seconds when the instances are swapped.

To illustrate the scale up behavior, I ran a modified sysbench benchmark to force some CPU load. Here’s a 32 threads benchmarks scanning a table on an Aurora Serverless cluster having an initial capacity of 1.

Aurora Serverless sysbench benchmark

The first scale up happened a little after 600s while the second one occurred around 1100s. The second event didn’t improve much the load but that is likely an artifact of the benchmark. It took a long time to increase the capacity from 1 to 2, it could be related to the high CPU usage on the instance. There is usually a small disruption of the query load when the instances are swapped but nothing too bad.

Scale Down

While scale up events happen when needed, scale down events are throttled to about once per 5 minutes except if the previous scaling event was a “scale up”, then the delay is 15 minutes.

Pros and Cons of Aurora Serverless

The RDS Aurora Serverless offering is very compelling for many use cases. It reduces the cost and simplifies the management. However, you must accept the inherent limitations like the long start up time when the instance was on pause and the small hiccups when the capacity is modified. If you cannot cope with the start up time, you can just configure the instance so it doesn’t pause, it will scale down to a capacity of 1 which seems to map to a t3.small instance type.

Of course, such an architecture imposes some drawbacks. Here’s a list of a few cons:

  • As we have seen, the scale up time is affected by the database load
  • Failover can also take more time than normally expected, especially if the ACU value is high
  • You are limited to one node although, at an ACU of 256, it means a db.r4.16xlarge
  • No public IP but you can set up a Data API
  • The application must be robust in the way it deals with database connections because of possible delays and reconnections

Cost Savings

The cost of an RDS Aurora cluster has three components: the instance costs, the IO costs, and the storage costs. The Aurora Serverless offering affects only the instance costs. The cost is a flat rate per capacity unit per hour. Like for the normal instances, the costs are region-dependent.  The lowest is found in the us-east at $0.06 USD per Capacity unit per hour.

If we consider a database used by web developers during the day and which can be paused out of the normal work hours and during the weekends, the saving can be above $240/month if the daily average capacity is only eight hours.

Oct
26
2020
--

MySQL 8.0.22: Asynchronous Replication Automatic Connection (IO Thread) Failover

MySQL 8.0.22 Asynchronous Replication Automatic Connection

MySQL 8.0.22 Asynchronous Replication Automatic ConnectionMySQL 8.0.22 was released on Oct 19, 2020, and came with nice features and a lot of bug fixes. Now, you can configure your async replica to choose the new source in case the existing source connection (IO thread) fails. In this blog, I am going to explain the entire process involved in this configuration with a use case.

Overview

This feature is very helpful to keep your replica server in sync in case of current source fails. 

To activate asynchronous connection failover, we need to set the “SOURCE_CONNECTION_AUTO_FAILOVER=1” on the “CHANGE MASTER” statement.

Once the IO connection fails, it will try to connect the existing source based on the “MASTER_RETRY_COUNT, MASTER_CONNECT_RETRY”. Then only it will do the failover. 

The feature will only work when the IO connection is failed, maybe the source crashed or stopped, or any network failures. This will not work if the replica is manually stopped using the “STOP REPLICA”. 

We have two new functions, which will help to add and delete the server entries from the source list. 

  • asynchronous_connection_failover_add_source ? Arguments (‘channel’,’host’,port,’network_namespace’,weight)
  • asynchronous_connection_failover_delete_source — Arguments (‘channel’,’host’,port,’network_namespace)

The source servers need to be configured in the table “mysql.replication_asynchronous_connection_failover”. We can also use the table “performance_schema.replication_asynchronous_connection_failover” to view the available servers in source list.

Requirements

  • GTID should be enabled on all the servers.
  • Regarding auto-positioning purpose, MASTER_AUTO_POSITION should be enabled on the replica ( CHANGE MASTER ).
  • The user and password should be the same on all the source servers.
  • Replication user and password must be set for the channel using the CHANGE MASTER .. FOR CHANNEL statement. 

Use Case

I have two data centers and three servers (dc1, dc2, report).

  • “dc1” and “report” servers are in the same data center.
  • “dc2” is in a different data center.
  • “dc1” and “dc2” are in active-passive async replication setup (dc1 – active, dc2 – passive)
  • “report” is configured as an async replica under “dc1” for reporting purposes.

Here, my requirement is, if the active node “dc1” is failed, I need to configure the “report” server under “dc2” to get the live data without manual work after a failure happens.

Configuration for Automatic Connection Failover

I have installed MySQL 8.0.22 on all three servers and configured the active – passive replication between “dc1” and “dc2”.

[root@dc1 ~]# mysql -e "select @@version, @@version_comment\G"
*************************** 1. row ***************************
        @@version: 8.0.22
@@version_comment: MySQL Community Server - GPL

At dc1,

mysql> show replica status\G
                  Source_Host: dc2
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
1 row in set (0.00 sec)

At dc2,

mysql> show replica status\G
                  Source_Host: dc1
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
1 row in set (0.00 sec)

Now, I need to configure the “report” server as an async replica under “dc1” with automatic failover options.

At report,

mysql> change master to
    -> master_user='Autofailover',
    -> master_password='Autofailover@321',
    -> master_host='dc1',
    -> master_auto_position=1,
    -> get_master_public_key=1,
    -> source_connection_auto_failover=1,
    -> master_retry_count=3,
    -> master_connect_retry=10
    -> for channel "herc7";
Query OK, 0 rows affected, 2 warnings (0.03 sec)

 

source_connection_auto_failover : To activate the automatic failover feature.

master_retry_count, master_connect_retry : The default setting is huge ( master_retry_count = 86400, master_connect_retry = 60 ), with that we need to wait 60 days ( 86400 * 60 /60/60/24 ) for the failover. So, i reduced the settings to 30 seconds ( 10 *3 )

mysql> start replica for channel "herc7";
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show replica status\G
                  Source_Host: dc1
                Connect_Retry: 10
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
        Seconds_Behind_Source: 0
                Last_IO_Error: 
    Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Source_Retry_Count: 3
      Last_IO_Error_Timestamp: 
                Auto_Position: 1
                 Channel_Name: herc7
1 row in set (0.00 sec)

You can see the replication is started and the failover settings are applied. The current primary source is “dc1”. Now, I am going to use the function to add the server details into the source list for the failover to dc2.

At “report”,

mysql> select asynchronous_connection_failover_add_source('herc7', 'dc2', 3306, '', 50);
+------------------------------------------------------------------------------+
| asynchronous_connection_failover_add_source('herc7', 'dc2', 3306, '', 50)    |
+------------------------------------------------------------------------------+
| The UDF asynchronous_connection_failover_add_source() executed successfully. |
+------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select * from mysql.replication_asynchronous_connection_failover\G
*************************** 1. row ***************************
     Channel_name: herc7
             Host: dc2
             Port: 3306
Network_namespace: 
           Weight: 50
1 row in set (0.00 sec)

It shows the source list is updated with dc2 details. We are good to perform the failover now. 

I am going to shut down the MySQL service on dc1. 

At dc1, 

[root@dc1 ~]# service mysqld stop
Redirecting to /bin/systemctl stop mysqld.service
[root@dc1 ~]#

At report server,

mysql> show replica status\G
                 Source_Host: dc1
                Connect_Retry: 10
           Replica_IO_Running: Connecting
          Replica_SQL_Running: Yes
        Seconds_Behind_Source: NULL
                Last_IO_Error: error reconnecting to master 'Autofailover@dc1:3306' - retry-time: 10 retries: 2 message: Can't connect to MySQL server on 'dc1' (111)
    Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Source_Retry_Count: 3
      Last_IO_Error_Timestamp: 201019 21:32:26
                Auto_Position: 1
                 Channel_Name: herc7
1 row in set (0.00 sec)

The IO thread is in “connecting” state. This means it is trying to establish the connection to the existing source (dc1) based on the “master_retry_count” and “master_connect_retry” settings. 

 After 30 seconds,

mysql> show replica status\G
                  Source_Host: dc2
                Connect_Retry: 10
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
        Seconds_Behind_Source: 0
                Last_IO_Error: 
    Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Source_Retry_Count: 3
      Last_IO_Error_Timestamp: 
                Auto_Position: 1
                 Channel_Name: herc7
1 row in set (0.00 sec)

You can see the source_host was changed to “dc2”. So, the server “report” performed the auto failover and connected to “dc2”. 

From the error log,

2020-10-19T21:32:16.247460Z 53 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'herc7': error reconnecting to master 'Autofailover@dc1:3306' - retry-time: 10 retries: 1 message: Can't connect to MySQL server on 'dc1' (111), Error_code: MY-002003
2020-10-19T21:32:26.249887Z 53 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'herc7': error reconnecting to master 'Autofailover@dc1:3306' - retry-time: 10 retries: 2 message: Can't connect to MySQL server on 'dc1' (111), Error_code: MY-002003
2020-10-19T21:32:36.251989Z 53 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'herc7': error reconnecting to master 'Autofailover@dc1:3306' - retry-time: 10 retries: 3 message: Can't connect to MySQL server on 'dc1' (111), Error_code: MY-002003
2020-10-19T21:32:36.254585Z 56 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-10-19T21:32:36.256170Z 56 [System] [MY-010562] [Repl] Slave I/O thread for channel 'herc7': connected to master 'Autofailover@dc2:3306',replication started in log 'FIRST' at position 196
2020-10-19T21:32:36.258628Z 56 [Warning] [MY-010549] [Repl] The master's UUID has changed, although this should not happen unless you have changed it manually. The old UUID was f68b8693-1246-11eb-a6c0-5254004d77d3.

The first three lines say it tried to connect the existing primary source “dc1” in a 10 seconds interval. There was no response from “dc1”, so it does the failover to “dc2” (connected to master ‘Autofailover@dc2:3306’).

It works perfectly!

Is Failback Possible?

Let’s experiment with the below two scenarios,

  • What happens if the primary node comes back online?
  • Does it perform a failback in case the server with higher weight comes back online?

What happens if the primary node comes back online?

I am going to start the “dc1”, which was shut down earlier to test the failover.

At “dc1”,

[root@dc1 ~]# service mysqld start
Redirecting to /bin/systemctl start mysqld.service
[root@dc1 ~]# mysql -e "show status like 'uptime'\G"
*************************** 1. row ***************************
Variable_name: Uptime
        Value: 4

Let’s see the replication on the “report” server.

mysql> show replica status\G
                  Source_Host: dc2
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
    Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates
                 Channel_Name: herc7
1 row in set (0.00 sec)

No changes. It is still connected to “dc2”. Failback has not happened.

Does it perform a failback in case the server with higher weight comes back online?

To test this, again I shut down the MySQL on “dc1” and updated the source list on the “report” server (dc1 weight > dc2 weight).

select asynchronous_connection_failover_add_source('herc7', 'dc1', 3306, '', 70)\G

mysql> select * from replication_asynchronous_connection_failover\G
*************************** 1. row ***************************
     Channel_name: herc7
             Host: dc1
             Port: 3306
Network_namespace: 
           Weight: 70
*************************** 2. row ***************************
     Channel_name: herc7
             Host: dc2
             Port: 3306
Network_namespace: 
           Weight: 50
2 rows in set (0.00 sec)

You can see the server “dc1” is configured with a higher weight (70). Now I am going to start the MySQL service on “dc1”.

At “dc1”,

[root@dc1 ~]# service mysqld start
Redirecting to /bin/systemctl start mysqld.service
[root@dc1 ~]# mysql -e "show status like 'uptime'\G"
*************************** 1. row ***************************
Variable_name: Uptime
        Value: 37

At “report” server,

mysql> show replica status\G
                  Source_Host: dc2
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
    Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates
1 row in set (0.00 sec)

No changes, so once the failover is done to the new source, the automatic failback will not happen until the new source goes down. 

From MySQL doc:

Once the replica has succeeded in making a connection, it does not change the connection unless the new source stops or there is a network failure. This is the case even if the source that became unavailable and triggered the connection change becomes available again and has a higher priority setting.

This solution is also very helpful in (cluster + async replica) environments. You can automatically switch the connection to another cluster node, in case the existing source cluster node fails. If your network is not stable, you need to consider to set the proper retry settings, because you may face the frequent failover with low thresholds.

Oct
23
2020
--

MySQL New Releases and Percona XtraBackup Incompatibilities

MySQL Percona Backup Incompatibilities

MySQL Percona Backup IncompatibilitiesEarlier this week, Oracle released their Q4 releases series. As on the previous releases, backward compatibility has been broken with previous versions of the server. This time on both MySQL 5.7 and 8.0:

MySQL 5.7.32

While our QA team was performing an extensive test on it,  we found out this version introduced a new compression format version. This change breaks backward compatibility with older versions of MySQL, which is expected on the 8.0 series but is not on 5.7. As Percona XtraBackup (PXB) is based on MySQL code, it makes MySQL 5.7.32 incompatible with current versions of Percona XtraBackup 2.4.20 and prior.

The issue does not affect only Percona XtraBackup but also prevents users from downgrading the server from 5.7.32 to any lower version on the 5.7 series – More details at https://bugs.mysql.com/bug.php?id=101266.

In summary, if you have tables with compression flag as below:

CREATE TABLE t1 (c1 INT) COMPRESSION="zlib";

The issue will manifest if a user using 5.7.32:

  • Creates a new compressed table.
  • Runs any ALTER TABLE  that uses the algorithm copy (table rebuild) on a compressed table.

At this moment, we advise users using compressed tables to hold the upgrade to 5.7.32.

We are currently working on making Percona XtraBackup 2.4.21 fully compatible with 5.7.32.

MySQL 8.0.22

Percona XtraBackup 8.0.14 (the latest version available) is not compatible with MySQL 8.0.22 due to disk format changes introduced in the 8.0.22 release.

WL#13782: InnoDB: Add dynamic config option to use fallocate() on Linux introduced a new redo log record MLOG_FILE_EXTEND which is written on the file extension and doesn’t depend on –innodb-extend-and-initialize option. Unfortunately this time, the redo log format version is not bumped up. Percona XtraBackup 8.0.14 during backup, cannot parse this new redo log record and so backup fails.

If by chance, MLOG_FILE_EXTEND is checkpointed, PXB during backup doesn’t see this new record. This leads to a misleading successful backup that cannot be prepared. Let’s see why.

Bug#31587625: PERFORMANCE DEGRADATION AFTER WL14073
This bug fix in 8.0.22, increased the DD version to 8022. PXB during prepare, de-serializes the SDI from IBD file to bootstrap dictionary. Due to the higher DD_VERSION in SDI, PXB 8.0.14 cannot deserialize the SDI and prepare fails.

At this moment, we advise all users to hold the upgrade to 8.0.22.

We are working on these incompatible issues, look out for an upcoming release of PXB release to take successful, consistent backups of 8.0.22

Oct
23
2020
--

CVE-2020-26542: SimpleLDAP Authentication in Percona Server for MySQL, Percona Server for MongoDB

CVE-2020-26542

CVE-2020-26542

When using the SimpleLDAP authentication in conjunction with Microsoft’s Active Directory, Percona has discovered a flaw that would allow authentication to complete when passing a blank value for the account password, leading to access against the service integrated with which Active Directory is deployed at the level granted to the authenticating account.

Applicability

Percona Server for MySQL

Percona Server for Mysql 8.x. < 8.0.21

Percona XtraDB Cluster

Percona XtraDB Cluster 8.x. < 8.0.20.11-3

Percona Server for MongoDB

Only the exact minor versions listed here are affected: 3.6.19-7.0, 4.0.18-11, 4.0.19-12, 4.0.20-13, 4.2.5-5, 4.2.6-6, 4.2.7-7, 4.2.8-8, 4.2.9-9, 4.4.0-1, 4.4.1-2

More Information

https://jira.percona.com/browse/PS-7358

https://jira.percona.com/browse/PSMDB-726

Release Notes

https://www.percona.com/doc/percona-distribution-mysql/8.0/release-notes-pxc-v8.0.20.upd2.html

Percona Distribution for MySQL (PXC Variant) 8.0.20, Fixes For Security Vulnerability: Release Roundup October 13, 2020

 

Oct
22
2020
--

Using Volume Snapshot/Clone in Kubernetes

Volume snapshot and clone Kubernetes

Volume snapshot and clone KubernetesOne of the most exciting storage-related features in Kubernetes is Volume snapshot and clone. It allows you to take a snapshot of data volume and later to clone into a new volume, which opens a variety of possibilities like instant backups or testing upgrades. This feature also brings Kubernetes deployments close to cloud providers, which allow you to get volume snapshots with one click.

Word of caution: for the database, it still might be required to apply fsfreeze and FLUSH TABLES WITH READ LOCK or

LOCK BINLOG FOR BACKUP

.

It is much easier in MySQL 8 now, because as with atomic DDL, MySQL 8 should provide crash-safe consistent snapshots without additional locking.

Let’s review how we can use this feature with Google Cloud Kubernetes Engine and Percona Kubernetes Operator for XtraDB Cluster.

First, the snapshot feature is still beta, so it is not available by default. You need to use GKE version 1.14 or later and you need to have the following enabled in your GKE: https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver#enabling_on_a_new_cluster.

It is done by enabling “Compute Engine persistent disk CSI Driver“.

Now we need to create a Cluster using storageClassName: standard-rwo for PersistentVolumeClaims. So the relevant part in the resource definition looks like this:

persistentVolumeClaim:
        storageClassName: standard-rwo
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 11Gi

Let’s assume we have cluster1 running:

NAME                                               READY   STATUS    RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running   0          49m
cluster1-haproxy-1                                 2/2     Running   0          48m
cluster1-haproxy-2                                 2/2     Running   0          48m
cluster1-pxc-0                                     1/1     Running   0          50m
cluster1-pxc-1                                     1/1     Running   0          48m
cluster1-pxc-2                                     1/1     Running   0          47m
percona-xtradb-cluster-operator-79d786dcfb-btkw2   1/1     Running   0          5h34m

And we want to clone a cluster into a new cluster, provisioning with the same dataset. Of course, it can be done using backup into a new volume, but snapshot and clone allow for achieving this much easier. There are still some additional required steps, I will list them as a Cheat Sheet.

1. Create VolumeSnapshotClass (I am not sure why this one is not present by default)

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshotClass
metadata:
        name: onesc
driver: pd.csi.storage.gke.io
deletionPolicy: Delete

2. Create snapshot

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
  name: snapshot-for-newcluster
spec:
  volumeSnapshotClassName: onesc
  source:
    persistentVolumeClaimName: datadir-cluster1-pxc-0

3. Clone into a new volume

Here I should note that we need to use the following as volume name convention used by Percona XtraDB Cluster Operator, it is:

datadir-<CLUSTERNAME>-pxc-0

Where CLUSTERNAME is the name used when we create clusters. So now we can clone snapshot into a volume:

datadir-newcluster-pxc-0

Where newcluster is the name of the new cluster.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: datadir-newcluster-pxc-0
spec:
  dataSource:
    name: snapshot-for-newcluster
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  storageClassName: standard-rwo
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 11Gi

Important: the volume spec in storageClassName and accessModes and storage size should match the original volume.

After volume claim created, now we can start newcluster, however, there is still a caveat; we need to use:

forceUnsafeBootstrap: true

Because otherwise, Percona XtraDB Cluster will think the data from the snapshot was not after clean shutdown (which is true) and will refuse to start.

There is still some limitation to this approach, which you may find inconvenient: the volume can be cloned in only the same namespace, so it can’t be easily transferred from the PRODUCTION namespace into the QA namespace.

Though it still can be done but will require some extra steps and admin Kubernetes privileges, I will show how in the following blog posts.

Oct
22
2020
--

Webinar November 10: Google Cloud Platform – MySQL at Scale with Reliable HA

webinar Percona Google Cloud Platform

webinar Percona Google Cloud PlatformGoogle Cloud Platform (GCP), with its CloudSQL offering, has become a leading platform for database-as-a-service workload deployments for many organizations. Scale and High Availability have surfaced as primary goals for many of these deployments. Unfortunately, the attainment of these objectives has been challenging.

Often, the answer has been to simply add more CloudSQL databases. Many, however, have found a better solution in Percona’s fully managed MySQL environment based in Google’s GCE. Percona’s fully managed MySQL offering provides benefits similar to CloudSQL, plus the ability to run MySQL with an unlimited number of tables and much more reliable database availability. This has empowered these organizations to reclaim control over their architecture decisions. With Percona’s fully managed MySQL, your database architecture choices are once again based on your needs and the needs of your workload, rather than the capabilities of the underlying platform. Learn more about GCP and the benefits of Percona’s fully-managed database service to scale and consolidate your databases.

Please join Stephen Thorn and Michal Nosek, Percona Solution Engineers, on Tuesday, November 10th, at 1 pm EDT for their webinar “Google Cloud Platform: MySQL at Scale with Reliable HA“.

Register for Webinar

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

Oct
16
2020
--

Danger of Changing Default of log_error_verbosity on MySQL/Percona Server for MySQL 5.7

Changing Default of log_error_verbosity mysql

Changing Default of log_error_verbosity mysqlChanging the default value (3) of log_error_verbosity in MySQL/Percona Server for MySQL 5.7 can have a hidden unintended effect! What does log_error_verbosity do exactly? As per the documentation:

The log_error_verbosity system variable specifies the verbosity for handling events intended for the error log.

Basically a value of 1 logs only [Errors]; 2 is 1)+[Warnings]; and 3 is 2)+[Notes].

For example, one might be tempted to change the default of log_error_verbosity since the error.log might be bloated with thousands (or millions) of lines with [Notes], such as:

2020-10-08T17:02:56.736179Z 3 [Note] Access denied for user 'root'@'localhost' (using password: NO)
2020-10-08T17:04:48.149038Z 4 [Note] Aborted connection 4 to db: 'unconnected' user: 'root' host: 'localhost' (Got timeout reading communication packets)

(P.S. you can read more about those Notes on this other Percona blog posts):

Fixing MySQL 1045 Error: Access Denied

MySQL “Got an error reading communication packet”

The problem is, after lowering log_error_verbosity to 1 or 2, no messages about server startup or shutdown would be printed to the log! That can really make troubleshooting really hard in the event of issues or system failure.  For completeness, on error.log from 5.7 at startup with default log_error_verbosity, the following should be seen:

2020-10-08T16:36:07.302168Z 0 [Note] InnoDB: PUNCH HOLE support available
2020-10-08T16:36:07.302188Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
...
2020-10-08T16:36:07.303998Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2020-10-08T16:36:07.307823Z 0 [Note] InnoDB: Completed initialization of buffer pool
...
2020-10-08T16:36:07.497571Z 0 [Note] /usr/sbin/mysqld: ready for connections.

And on shutdown:

2020-10-08T16:36:10.447002Z 0 [Note] Giving 0 client threads a chance to die gracefully
2020-10-08T16:36:10.447022Z 0 [Note] Shutting down slave threads
2020-10-08T16:36:10.447027Z 0 [Note] Forcefully disconnecting 0 remaining clients
…
2020-10-08T16:36:12.104099Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

With log_error_verbosity =2, there won’t be messages about MySQL startup, but some warnings that are only printed at startup might give a hint of the time of server restart such as:

2020-10-08T16:30:21.966844Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2020-10-08T16:30:22.181367Z 0 [Warning] CA certificate ca.pem is self signed.
2020-10-08T16:30:22.221732Z 0 [Warning] Changed limits: table_open_cache: 431 (requested 6000)

If there is no information about server restart, system logs can be checked for information about restarts:

# cat /var/log/messages 
...
Oct  8 16:31:25 carlos-tutte-latest57-standalone-1 systemd: Starting MySQL Server...
Oct  8 16:31:26 carlos-tutte-latest57-standalone-1 systemd: Started MySQL Server.

If still no clue when was MySQL last started, checking the “Uptime” status variable can help in calculating the last startup.

The problem does NOT occur on MySQL/Percona Server for MySQL 8.0 since even with log_error_verbosity = 1, the following startup/shutdown lines are printed on the error.log:

2020-10-08T16:31:54.532504Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.20-11) starting as process 1052
2020-10-08T16:31:54.540342Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2020-10-08T16:31:55.026815Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2020-10-08T16:31:55.136125Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: '/var/lib/mysql/mysqlx.sock' bind-address: '::' port: 33060
2020-10-08T16:31:55.270669Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.20-11'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona Server (GPL), Release 11, Revision 5b5a5d2.
2020-10-08T16:32:01.708932Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.20-11)  Percona Server (GPL), Release 11, Revision 5b5a5d2.

In conclusion, if possible, avoid changing the default of log_error_verbosity on MySQL/Percona Server for MySQL 5.7. And if you need to change it, do it online with SET GLOBAL instead of through the config file, since, in the event of a restart, startup messages won’t be logged.

Oct
12
2020
--

Ask An Expert On Percona’s Community Forum!

Percona Community Forum

Percona Community ForumHave you recently visited Percona’s Community Forum? It’s your hub for direct Q&A with top database experts, including Percona CEO Peter Zaitsev and CTO Vadim Tkachenko. Last quarter over 450 users participated, including 45 engineers from Percona’s staff. Since it was first launched in 2006 and revamped earlier this year, our Forum has built up a mountain of easily-searched database expertise.

This free online community is open to everyone from newbies to maestros. Register as a user and then ask your question or visit the unanswered question list and help someone else. You’ll feel great doing so, plus you’ll earn points and advance in rank, building your online reputation like done by vaibhav_upadhyay40, Fan, Federico Razzoli, Ghan, djp, Björn, rdab100, Stateros, gordan, Venkat, and others.

Our Forum Q&A covers all flavors of MySQL, MongoDB, and PostgreSQL, as well as key utilities like ProxySQL and PMM. Plus it’s the only site announcing all of Percona’s new software releases and Percona’s occasional urgent software alerts. You can even personalize email notifications to track only certain categories and skip the rest.  And we promise to never spam you!

A few of our most popular posts illustrate how it all works:

So what’s the fine print? Most importantly, remember that it’s volunteers who answer, so answers may not be timely, might not be in in-depth, or occasionally might never arrive at all. Response time now averages five days. Remember to never share confidential details, as everything on the Forum is public.

The Forum operates on a spirit of self-help, so do a reasonable amount of your own research before popping off a question. And if you get help, try to give back help too. Everything depends on a spirit of reciprocity and goodwill. The Forum is for those who love databases and want to share their enthusiasm with others, and for those who want to master the database craft.

Finally, as our lawyers make us say, answers are “as-is” meaning Percona does not guarantee accuracy and disclaims all liability. Our Help Articles, Code of Conduct, and Terms of Service explain it all.

So register as a user and give the Percona Forum a try! As always we welcome your suggestions and feedback to Community-Team@Percona.com.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com