Oct
08
2018
--

Persistence of autoinc fixed in MySQL 8.0

MySQL 8.0 autoinc persistence fixed

MySQL 8.0 autoinc persistence fixedThe release of MySQL 8.0 has brought a lot of bold implementations that touched on things that have been avoided before, such as added support for common table expressions and window functions. Another example is the change in how AUTO_INCREMENT (autoinc) sequences are persisted, and thus replicated.

This new implementation carries the fix for bug #73563 (Replace result in auto_increment value less or equal than max value in row-based), which we’ve only found about recently. The surprising part is that the use case we were analyzing is a somewhat common one; this must be affecting a good number of people out there.

Understanding the bug

The business logic of the use case is such the UNIQUE column found in a table whose id is managed by an AUTO_INCREMENT sequence needs to be updated, and this is done with a REPLACE operation:

“REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted.”

So, what happens in practice in this particular case is a DELETE followed by an INSERT of the target row.

We will explore this scenario here in the context of an oversimplified currency converter application that uses USD as base reference:

CREATE TABLE exchange_rate (
id INT PRIMARY KEY AUTO_INCREMENT,
currency VARCHAR(3) UNIQUE,
rate FLOAT(5,3)
) ENGINE=InnoDB;

Let’s add a trio of rows to this new table:

INSERT INTO exchange_rate (currency,rate) VALUES ('EUR',0.854), ('GBP',0.767), ('BRL',4.107);

which gives us the following initial set:

master (test) > select * from exchange_rate;
+----+----------+-------+
| id | currency | rate  |
+----+----------+-------+
|  1 | EUR      | 0.854 |
|  2 | GBP      | 0.767 |
|  3 | BRL      | 4.107 |
+----+----------+-------+
3 rows in set (0.00 sec)

Now we update the rate for Brazilian Reais using a REPLACE operation:

REPLACE INTO exchange_rate SET currency='BRL', rate=4.500;

With currency being a UNIQUE field the row is fully replaced:

master (test) > select * from exchange_rate;
+----+----------+-------+
| id | currency | rate  |
+----+----------+-------+
|  1 | EUR      | 0.854 |
|  2 | GBP      | 0.767 |
|  4 | BRL      | 4.500 |
+----+----------+-------+
3 rows in set (0.00 sec)

and thus the autoinc sequence is updated:

master (test) > SHOW CREATE TABLE exchange_rate\G
*************************** 1. row ***************************
     Table: exchange_rate
Create Table: CREATE TABLE `exchange_rate` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`currency` varchar(3) DEFAULT NULL,
`rate` float(5,3) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `currency` (`currency`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

The problem is that the autoinc sequence is not updated in the replica as well:

slave1 (test) > select * from exchange_rate;show create table exchange_rate\G
+----+----------+-------+
| id | currency | rate  |
+----+----------+-------+
|  1 | EUR      | 0.854 |
|  2 | GBP      | 0.767 |
|  4 | BRL      | 4.500 |
+----+----------+-------+
3 rows in set (0.00 sec)
*************************** 1. row ***************************
     Table: exchange_rate
Create Table: CREATE TABLE `exchange_rate` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`currency` varchar(3) DEFAULT NULL,
`rate` float(5,3) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `currency` (`currency`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Now, the moment we promote that replica as master and start writing to this table we’ll hit a duplicate key error:

slave1 (test) > REPLACE INTO exchange_rate SET currency='BRL', rate=4.600;
ERROR 1062 (23000): Duplicate entry '4' for key 'PRIMARY'

Note that:

a) the transaction fails and the row is not replaced, however the autoinc sequence is incremented:

slave1 (test) > SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE table_schema='test' AND table_name='exchange_rate';
+----------------+
| AUTO_INCREMENT |
+----------------+
|              5 |
+----------------+
1 row in set (0.00 sec)

b) this problem only happens with row-based replication (binlog_format=ROW), where REPLACE in this case is logged as a row UPDATE:

# at 6129
#180829 18:29:55 server id 100  end_log_pos 5978 CRC32 0x88da50ba Update_rows: table id 117 flags: STMT_END_F
### UPDATE `test`.`exchange_rate`
### WHERE
###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
###   @2='BRL' /* VARSTRING(3) meta=3 nullable=1 is_null=0 */
###   @3=4.107                /* FLOAT meta=4 nullable=1 is_null=0 */
### SET
###   @1=4 /* INT meta=0 nullable=0 is_null=0 */
###   @2='BRL' /* VARSTRING(3) meta=3 nullable=1 is_null=0 */
###   @3=4.5                  /* FLOAT meta=4 nullable=1 is_null=0 */

With statement-based replication—or even mixed format—the REPLACE statement is replicated as is: it will trigger a DELETE+INSERT in the background on the replica and thus update the autoinc sequence in the same way it did on the master.

This example (tested with Percona Server versions 5.5.61, 5.6.36 and 5.7.22) helps illustrate the issue with autoinc sequences not being persisted as they should be with row-based replication. However, MySQL’s Worklog #6204 includes a couple of scarier scenarios involving the master itself, such as when the server crashes while a transaction is writing to a table similar to the one used in the example above. MySQL 8.0 remedies this bug.

Workarounds

There are a few possible workarounds to consider if this problem is impacting you and if neither upgrading to the 8 series nor resorting to statement-based or mixed replication format are viable options.

We’ll be discussing three of them here: one that resorts around the execution of checks before a failover (to detect and fix autoinc inconsistencies in replicas), another that requires a review of all REPLACE statements like the one from our example and adapt it as to include the id field, thus avoiding the bug, and finally one that requires changing the schema of affected tables in such a way that the target field is made the Primary Key of the table while id (autoinc) is converted into a UNIQUE key.

a) Detect and fix

The less intrusive of the workarounds we conceived for the problem at hand in terms of query and schema changes is to run a check for each of the tables that might be facing this issue in a replica before we promote it as master in a failover scenario:

slave1 (test) > SELECT ((SELECT MAX(id) FROM exchange_rate)>=(SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE table_schema='test' AND table_name='exchange_rate')) as `check`;
+-------+
| check |
+-------+
|     1 |
+-------+
1 row in set (0.00 sec)

If the table does not pass the test, like ours didn’t at first (just before we attempted a REPLACE after we failed over to the replica), then update autoinc accordingly. The full routine (check + update of autoinc) could be made into a single stored procedure:

DELIMITER //
CREATE PROCEDURE CheckAndFixAutoinc()
BEGIN
 DECLARE done TINYINT UNSIGNED DEFAULT 0;
 DECLARE tableschema VARCHAR(64);
 DECLARE tablename VARCHAR(64);
 DECLARE columnname VARCHAR(64);  
 DECLARE cursor1 CURSOR FOR SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') AND EXTRA LIKE '%auto_increment%';
 DECLARE CONTINUE HANDLER FOR NOT FOUND SET done=1;
 OPEN cursor1;  
 start_loop: LOOP
  IF done THEN
    LEAVE start_loop;
  END IF;
  FETCH cursor1 INTO tableschema, tablename, columnname;
  SET @get_autoinc = CONCAT('SELECT @check1 := ((SELECT MAX(', columnname, ') FROM ', tableschema, '.', tablename, ')>=(SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA=\'', tableschema, '\' AND TABLE_NAME=\'', tablename, '\')) as `check`');
  PREPARE stm FROM @get_autoinc;
  EXECUTE stm;
  DEALLOCATE PREPARE stm;
  IF @check1>0 THEN
    BEGIN
      SET @select_max_id = CONCAT('SELECT @max_id := MAX(', columnname, ')+1 FROM ', tableschema, '.', tablename);
      PREPARE select_max_id FROM @select_max_id;
      EXECUTE select_max_id;
      DEALLOCATE PREPARE select_max_id;
      SET @update_autoinc = CONCAT('ALTER TABLE ', tableschema, '.', tablename, ' AUTO_INCREMENT=', @max_id);
      PREPARE update_autoinc FROM @update_autoinc;
      EXECUTE update_autoinc;
      DEALLOCATE PREPARE update_autoinc;
    END;
  END IF;
 END LOOP start_loop;  
 CLOSE cursor1;
END//
DELIMITER ;

It doesn’t allow for as clean a failover as we would like but it can be helpful if you’re stuck with MySQL<8.0 and binlog_format=ROW and cannot make changes to your queries or schema.

b) Include Primary Key in REPLACE statements

If we had explicitly included the id (Primary Key) in the REPLACE operation from our example it would have also been replicated as a DELETE+INSERT even when binlog_format=ROW:

master (test) > REPLACE INTO exchange_rate SET currency='BRL', rate=4.500, id=3;
# at 16151
#180905 13:32:17 server id 100  end_log_pos 15986 CRC32 0x1d819ae9  Write_rows: table id 117 flags: STMT_END_F
### DELETE FROM `test`.`exchange_rate`
### WHERE
###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
###   @2='BRL' /* VARSTRING(3) meta=3 nullable=1 is_null=0 */
###   @3=4.107                /* FLOAT meta=4 nullable=1 is_null=0 */
### INSERT INTO `test`.`exchange_rate`
### SET
###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
###   @2='BRL' /* VARSTRING(3) meta=3 nullable=1 is_null=0 */
###   @3=4.5                  /* FLOAT meta=4 nullable=1 is_null=0 */
# at 16199
#180905 13:32:17 server id 100  end_log_pos 16017 CRC32 0xf11fed56  Xid = 184
COMMIT/*!*/;

We could point out that we are doing it wrong by not having the id included in the REPLACE statement in the first place; the reason for not doing so would be mostly related to avoiding an extra lookup for each replace (to obtain the id for the currency we want to update). On the other hand, what if your business logic do expects the id to change at each REPLACE ? You should have such requirement in mind when considering this workaround as it is effectively a functional change to what we had initially.

c) Make the target field the Primary Key and keep autoinc as a UNIQUE key

If we make currency the Primary Key of our table and id a UNIQUE key instead:

CREATE TABLE exchange_rate (
id INT UNIQUE AUTO_INCREMENT,
currency VARCHAR(3) PRIMARY KEY,
rate FLOAT(5,3)
) ENGINE=InnoDB;

the same REPLACE operation will be replicated as a DELETE+INSERT too:

# at 19390
#180905 14:03:56 server id 100  end_log_pos 19225 CRC32 0x7042dcd5  Write_rows: table id 131 flags: STMT_END_F
### DELETE FROM `test`.`exchange_rate`
### WHERE
###   @1=3 /* INT meta=0 nullable=0 is_null=0 */
###   @2='BRL' /* VARSTRING(3) meta=3 nullable=0 is_null=0 */
###   @3=4.107                /* FLOAT meta=4 nullable=1 is_null=0 */
### INSERT INTO `test`.`exchange_rate`
### SET
###   @1=4 /* INT meta=0 nullable=0 is_null=0 */
###   @2='BRL' /* VARSTRING(3) meta=3 nullable=0 is_null=0 */
###   @3=4.5                  /* FLOAT meta=4 nullable=1 is_null=0 */
# at 19438
#180905 14:03:56 server id 100  end_log_pos 19256 CRC32 0x79efc619  Xid = 218
COMMIT/*!*/;

Of course, the same would be true if we had just removed id entirely from the table and kept currency as the Primary Key. This would work in our particular test example but that won’t always be the case. Please note though that if you do keep id on the table you must make it a UNIQUE key: this workaround is based on the fact that this key becomes a second unique constraint, which triggers a different code path to log a replace operation. Had we made it a simple, non-unique key instead that wouldn’t be the case.

If you have any comments or suggestions about the issue addressed in this post, the workarounds we propose, or even a different view of the problem you would like to share please leave a comment in the section below.

Co-Author: Trey Raymond

Trey RaymondTrey Raymond is a Sr. Database Engineer for Oath Inc. (née Yahoo!), specializing in MySQL. Since 2010, he has worked to build the company’s database platform and supporting team into industry leaders.

While a performance guru at heart, his experience and responsibilities range from hardware and capacity planning all through the stack to database tool and utility development.

He has a reputation for breaking things to learn something new.

Co-Author: Fernando Laudares

fernando laudaresFernando is a Senior Support Engineer with Percona. Fernando’s work experience includes the architecture, deployment and maintenance of IT infrastructures based on Linux, open source software and a layer of server virtualization. He’s now focusing on the universe of MySQL, MongoDB and PostgreSQL with a particular interest in understanding the intricacies of database systems, and contributes regularly to this blog. You can read his other articles here.

Apr
15
2015
--

Checking table definition consistency with mysqldiff

Data inconsistencies in replication environments are a pretty common. There are lots of posts that explain how to fix those using pt-table-checksum and pt-table-sync. Usually we only care about the data but from time to time we receive this question in support:

How can I check the table definition consistency between servers?

Replication also allow us to have different table definition between master and slaves. For example, there are some cases that you need some indexes on slaves for querying purposes but are not really needed on the master. There are some other cases where those differences are just a mistake that needs to be fixed.

mysqldiff, included in Oracle’s MySQL Utilities, can help us to find those differences and get the information we need to fix those them. In this post I’m going to show you how to use it with an example.

Find table definition inconsistencies

mysqldiff allows us to find those inconsistencies checking the differences between the tables on the same server (different databases) or on different servers (also possible on different databases). In this example I’m going to search for differences in table definitions between two different servers, server1 and server2.

The command line is pretty simple. This is used to compare the tables on “test” database:

mysqldiff --server1=user@host1 --server2=user@host2 test:test

If the database name is different:

mysqldiff --server1=user@host1 --server2=user@host2 testdb:anotherdb

If the table name is different:

mysqldiff --server1=user@host1 --server2=user@host2 testdb.table1:anotherdb.anothertable

Now I want to check the table definition consistency between two servers. The database’s name is “employees”:

# mysqldiff --force --server1=root:msandbox@127.0.0.1:21489 --server2=root:msandbox@127.0.0.1:21490 employees:employees
# WARNING: Using a password on the command line interface can be insecure.
# server1 on 127.0.0.1: ... connected.
# server2 on 127.0.0.1: ... connected.
# Comparing `employees` to `employees`                             [PASS]
# Comparing `employees`.`departments` to `employees`.`departments`   [FAIL]
# Object definitions differ. (--changes-for=server1)
#
--- `employees`.`departments`
+++ `employees`.`departments`
@@ -1,6 +1,6 @@
 CREATE TABLE `departments` (
   `dept_no` char(4) NOT NULL,
-  `dept_name` varchar(40) NOT NULL,
+  `dept_name` varchar(256) DEFAULT NULL,
   PRIMARY KEY (`dept_no`),
   UNIQUE KEY `dept_name` (`dept_name`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
# Comparing `employees`.`dept_emp` to `employees`.`dept_emp`       [PASS]
# Comparing `employees`.`dept_manager` to `employees`.`dept_manager`   [PASS]
# Comparing `employees`.`employees` to `employees`.`employees`     [FAIL]
# Object definitions differ. (--changes-for=server1)
#
--- `employees`.`employees`
+++ `employees`.`employees`
@@ -5,5 +5,6 @@
   `last_name` varchar(16) NOT NULL,
   `gender` enum('M','F') NOT NULL,
   `hire_date` date NOT NULL,
-  PRIMARY KEY (`emp_no`)
+  PRIMARY KEY (`emp_no`),
+  KEY `last_name` (`last_name`,`first_name`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
# Comparing `employees`.`salaries` to `employees`.`salaries`       [PASS]
# Comparing `employees`.`titles` to `employees`.`titles`           [PASS]
Compare failed. One or more differences found.

There are at least two differences. One in departments table and another one in employees table. The output is similar to diff. By default the tool stops after finding the first difference. That’s why we use –force, to tell the tool to continue checking all the tables.

It shows us that on departments the dept_name is varchar(40) on server1 and varchar(256) on server2. For “employees” table, it has a KEY (last_name, first_name) on the server2 that is not present on server1. Why is it taking server2 as a reference? Because of this line:

# Object definitions differ. (--changes-for=server1)

So, the changes shown on the diff are for server1. If you want server2 to be the one to be changed and server1 used as reference, then –changes-for=server2 would be needed.

In some cases the diff output is not really useful. We actually need a SQL query to do the changes on the server. We just need to add –difftype=sql to the command line:

# mysqldiff --force --difftype=sql --server1=root:msandbox@127.0.0.1:21489 --server2=root:msandbox@127.0.0.1:21490 employees:employees
[...]
# Comparing `employees`.`departments` to `employees`.`departments`   [FAIL]
# Transformation for --changes-for=server1:
ALTER TABLE `employees`.`departments`
  DROP INDEX dept_name,
  ADD UNIQUE INDEX dept_name (dept_name),
  CHANGE COLUMN dept_name dept_name varchar(256) NULL;
[...]
# Comparing `employees`.`employees` to `employees`.`employees`     [FAIL]
# Transformation for --changes-for=server1:
#
ALTER TABLE `employees`.`employees`
  DROP PRIMARY KEY,
  ADD PRIMARY KEY(`emp_no`),
  ADD INDEX last_name (last_name,first_name);

As we can see, the tool is not perfect. There are two problems here:

1- On “departments table” it drops a UNIQUE key that is present in both servers only to add it again. Waste of time and resources.

2- On “employees” table it drops and recreate the PRIMARY KEY, again something that is not needed a all.

I have created a bug report but this also teaches us a good lesson. Don’t just copy and paste commands without first double checking it.

What mysqldiff runs under the hood?

Mostly queries on INFORMATION_SCHEMA. These are the ones used to check inconsistencies on departments:

SHOW CREATE TABLE `departments`;
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT, AVG_ROW_LENGTH, CHECKSUM, TABLE_COLLATION, TABLE_COMMENT, ROW_FORMAT, CREATE_OPTIONS
  FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT ORDINAL_POSITION, COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE,
         COLUMN_DEFAULT, EXTRA, COLUMN_COMMENT, COLUMN_KEY
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT PARTITION_NAME, SUBPARTITION_NAME, PARTITION_ORDINAL_POSITION,
         SUBPARTITION_ORDINAL_POSITION, PARTITION_METHOD, SUBPARTITION_METHOD,
         PARTITION_EXPRESSION, SUBPARTITION_EXPRESSION, PARTITION_DESCRIPTION
  FROM INFORMATION_SCHEMA.PARTITIONS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT CONSTRAINT_NAME, COLUMN_NAME, REFERENCED_TABLE_SCHEMA,
         REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME
  FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments' AND
        REFERENCED_TABLE_SCHEMA IS NOT NULL;
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT, AVG_ROW_LENGTH, CHECKSUM, TABLE_COLLATION, TABLE_COMMENT, ROW_FORMAT, CREATE_OPTIONS
  FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT ORDINAL_POSITION, COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE,
         COLUMN_DEFAULT, EXTRA, COLUMN_COMMENT, COLUMN_KEY
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT PARTITION_NAME, SUBPARTITION_NAME, PARTITION_ORDINAL_POSITION,
         SUBPARTITION_ORDINAL_POSITION, PARTITION_METHOD, SUBPARTITION_METHOD,
         PARTITION_EXPRESSION, SUBPARTITION_EXPRESSION, PARTITION_DESCRIPTION
  FROM INFORMATION_SCHEMA.PARTITIONS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';

As a summary, it checks partitions, row format, collation, constraints and so on.

Conclusion

There are different tools for different purposes. We can check the data consistency with pt-table-checkum/pt-table-sync but also the table definitions with mysqldiff.

The post Checking table definition consistency with mysqldiff appeared first on MySQL Performance Blog.

Apr
15
2015
--

Checking table definition consistency with mysqldiff

Data inconsistencies in replication environments are a pretty common. There are lots of posts that explain how to fix those using pt-table-checksum and pt-table-sync. Usually we only care about the data but from time to time we receive this question in support:

How can I check the table definition consistency between servers?

Replication also allow us to have different table definition between master and slaves. For example, there are some cases that you need some indexes on slaves for querying purposes but are not really needed on the master. There are some other cases where those differences are just a mistake that needs to be fixed.

mysqldiff, included in Oracle’s MySQL Utilities, can help us to find those differences and get the information we need to fix those them. In this post I’m going to show you how to use it with an example.

Find table definition inconsistencies

mysqldiff allows us to find those inconsistencies checking the differences between the tables on the same server (different databases) or on different servers (also possible on different databases). In this example I’m going to search for differences in table definitions between two different servers, server1 and server2.

The command line is pretty simple. This is used to compare the tables on “test” database:

mysqldiff --server1=user@host1 --server2=user@host2 test:test

If the database name is different:

mysqldiff --server1=user@host1 --server2=user@host2 testdb:anotherdb

If the table name is different:

mysqldiff --server1=user@host1 --server2=user@host2 testdb.table1:anotherdb.anothertable

Now I want to check the table definition consistency between two servers. The database’s name is “employees”:

# mysqldiff --force --server1=root:msandbox@127.0.0.1:21489 --server2=root:msandbox@127.0.0.1:21490 employees:employees
# WARNING: Using a password on the command line interface can be insecure.
# server1 on 127.0.0.1: ... connected.
# server2 on 127.0.0.1: ... connected.
# Comparing `employees` to `employees`                             [PASS]
# Comparing `employees`.`departments` to `employees`.`departments`   [FAIL]
# Object definitions differ. (--changes-for=server1)
#
--- `employees`.`departments`
+++ `employees`.`departments`
@@ -1,6 +1,6 @@
 CREATE TABLE `departments` (
   `dept_no` char(4) NOT NULL,
-  `dept_name` varchar(40) NOT NULL,
+  `dept_name` varchar(256) DEFAULT NULL,
   PRIMARY KEY (`dept_no`),
   UNIQUE KEY `dept_name` (`dept_name`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
# Comparing `employees`.`dept_emp` to `employees`.`dept_emp`       [PASS]
# Comparing `employees`.`dept_manager` to `employees`.`dept_manager`   [PASS]
# Comparing `employees`.`employees` to `employees`.`employees`     [FAIL]
# Object definitions differ. (--changes-for=server1)
#
--- `employees`.`employees`
+++ `employees`.`employees`
@@ -5,5 +5,6 @@
   `last_name` varchar(16) NOT NULL,
   `gender` enum('M','F') NOT NULL,
   `hire_date` date NOT NULL,
-  PRIMARY KEY (`emp_no`)
+  PRIMARY KEY (`emp_no`),
+  KEY `last_name` (`last_name`,`first_name`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
# Comparing `employees`.`salaries` to `employees`.`salaries`       [PASS]
# Comparing `employees`.`titles` to `employees`.`titles`           [PASS]
Compare failed. One or more differences found.

There are at least two differences. One in departments table and another one in employees table. The output is similar to diff. By default the tool stops after finding the first difference. That’s why we use –force, to tell the tool to continue checking all the tables.

It shows us that on departments the dept_name is varchar(40) on server1 and varchar(256) on server2. For “employees” table, it has a KEY (last_name, first_name) on the server2 that is not present on server1. Why is it taking server2 as a reference? Because of this line:

# Object definitions differ. (--changes-for=server1)

So, the changes shown on the diff are for server1. If you want server2 to be the one to be changed and server1 used as reference, then –changes-for=server2 would be needed.

In some cases the diff output is not really useful. We actually need a SQL query to do the changes on the server. We just need to add –difftype=sql to the command line:

# mysqldiff --force --difftype=sql --server1=root:msandbox@127.0.0.1:21489 --server2=root:msandbox@127.0.0.1:21490 employees:employees
[...]
# Comparing `employees`.`departments` to `employees`.`departments`   [FAIL]
# Transformation for --changes-for=server1:
ALTER TABLE `employees`.`departments`
  DROP INDEX dept_name,
  ADD UNIQUE INDEX dept_name (dept_name),
  CHANGE COLUMN dept_name dept_name varchar(256) NULL;
[...]
# Comparing `employees`.`employees` to `employees`.`employees`     [FAIL]
# Transformation for --changes-for=server1:
#
ALTER TABLE `employees`.`employees`
  DROP PRIMARY KEY,
  ADD PRIMARY KEY(`emp_no`),
  ADD INDEX last_name (last_name,first_name);

As we can see, the tool is not perfect. There are two problems here:

1- On “departments table” it drops a UNIQUE key that is present in both servers only to add it again. Waste of time and resources.

2- On “employees” table it drops and recreate the PRIMARY KEY, again something that is not needed a all.

I have created a bug report but this also teaches us a good lesson. Don’t just copy and paste commands without first double checking it.

What mysqldiff runs under the hood?

Mostly queries on INFORMATION_SCHEMA. These are the ones used to check inconsistencies on departments:

SHOW CREATE TABLE `departments`;
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT, AVG_ROW_LENGTH, CHECKSUM, TABLE_COLLATION, TABLE_COMMENT, ROW_FORMAT, CREATE_OPTIONS
  FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT ORDINAL_POSITION, COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE,
         COLUMN_DEFAULT, EXTRA, COLUMN_COMMENT, COLUMN_KEY
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT PARTITION_NAME, SUBPARTITION_NAME, PARTITION_ORDINAL_POSITION,
         SUBPARTITION_ORDINAL_POSITION, PARTITION_METHOD, SUBPARTITION_METHOD,
         PARTITION_EXPRESSION, SUBPARTITION_EXPRESSION, PARTITION_DESCRIPTION
  FROM INFORMATION_SCHEMA.PARTITIONS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT CONSTRAINT_NAME, COLUMN_NAME, REFERENCED_TABLE_SCHEMA,
         REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME
  FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments' AND
        REFERENCED_TABLE_SCHEMA IS NOT NULL;
SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, AUTO_INCREMENT, AVG_ROW_LENGTH, CHECKSUM, TABLE_COLLATION, TABLE_COMMENT, ROW_FORMAT, CREATE_OPTIONS
  FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT ORDINAL_POSITION, COLUMN_NAME, COLUMN_TYPE, IS_NULLABLE,
         COLUMN_DEFAULT, EXTRA, COLUMN_COMMENT, COLUMN_KEY
  FROM INFORMATION_SCHEMA.COLUMNS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';
SELECT PARTITION_NAME, SUBPARTITION_NAME, PARTITION_ORDINAL_POSITION,
         SUBPARTITION_ORDINAL_POSITION, PARTITION_METHOD, SUBPARTITION_METHOD,
         PARTITION_EXPRESSION, SUBPARTITION_EXPRESSION, PARTITION_DESCRIPTION
  FROM INFORMATION_SCHEMA.PARTITIONS
  WHERE TABLE_SCHEMA = 'employees' AND TABLE_NAME = 'departments';

As a summary, it checks partitions, row format, collation, constraints and so on.

Conclusion

There are different tools for different purposes. We can check the data consistency with pt-table-checkum/pt-table-sync but also the table definitions with mysqldiff.

The post Checking table definition consistency with mysqldiff appeared first on MySQL Performance Blog.

Apr
03
2015
--

Illustrating Primary Key models in InnoDB and their impact on disk usage

On a recent engagement I worked with a customer who makes extensive use of UUID() values for their Primary Key and stores it as char(36), and their row count on this example table has grown to over 1 billion rows.

The table is INSERT-only (no UPDATEs or DELETEs), and the bulk of their retrieval are PK lookups. Lookups by PK were performing acceptably, but they were concerned with the space usage by the table as we were approaching 1TB (running with innodb_file_per_table=1 and Percona Server 5.5).

This schema model presents an increasing burden for backups since they use Percona XtraBackup, and so the question was asked: does their choice of an effectively random Primary Key based on UUID() impact their on-disk storage, and to what extent? And as a neat trick I show towards the end of this post how you can calculate the rate of fragmentation in your table on a regular basis if you’re so inclined. So read on!

For background, the more common approach for a Primary Key in InnoDB is one that uses an integer AUTO_INCREMENT value. One of the benefits of a PK AUTO_INCREMENT is that it allows InnoDB to add new entries at the end of the table, and keeps the BTREE index from having to be split at any point.  More on this splitting reference in a moment.  Note also that this blog post isn’t intended to promote one type of model over another, my goal is really to illustrate the impact your choice of PK will have on the data on disk.

A Primary Key serves multiple purposes in InnoDB:

  • Ensures uniqueness between rows
  • InnoDB saves row data on disk clustered by the Primary Key
  • Depending on the type used and INSERT / UPDATE pattern used, either provides for a unfragmented or severely fragmented Primary Key

I wanted to profile three different Primary Key types:

  1. integer AUTO_INCREMENT – this key will consume 4 bytes
  2. binary(16) using Ordered UUID() – as per Karthik’s UUID()-optimised blog post
  3. char(36) using UUID() – what my customer was used

I then used the powerful tool innodb_space’s function space-lsn-age-illustrate  (from Jeremy Cole’s innodb_ruby project) to plot the LSN (InnoDB’s Log Sequence Number, an always-incrementing value) pages from each table that uses the different Primary Keys via ASCII colour (so hot, right?  Thanks Jeremy!!).  For reference, the legend indicates that the darker the colour, the “older” the page’s updated version is (the LSN), while as you move across the colour spectrum to pink you’re looking at the most recently written LSN values.  What I’m trying to illustrate is that when you use AUTO_INCREMENT or UUID() that has been modified to insert in an ascending order, you get virtually no page splits, and thus consume the minimal amount of database pages.  On the left side you’re looking at the page IDs for this table, and the lower the number of pages consumed, the more efficiently packed the table’s data is within those pages.

This is an example of INSERT-only based on a Primary Key of AUTO_INCREMENT.  Notice how the darker colours are heavy at the earliest pages and lighter as we get to writing out the higher number pages.  Further this table finishes writing somewhere around 700 pages consumed.

t1_AUTO_INCREMENT

Primary Key integer AUTO_INCREMENT

As we look at the optimised-UUID() INSERT pattern we see  that it too has a very evenly distributed pattern with oldest pages at the start (lowest page IDs) and newest written pages at the end of the table.  More pages are consumed however because the Primary Key is wider, so we end somewhere around 1,100 pages consumed.

t2_Ordered_UUID

Ordered UUID()-based Primary Key

Finally we arrive at the UUID() INSERT pattern, and as we expected, the fragmentation is extreme and has caused many page splits — this is the behaviour in InnoDB when a record needs to be written into an existing page (since it falls between two existing values) and InnoDB realises that if this additional value is written that the capacity of the page will be overcommitted, so it then “splits” the page into two pages and writes them both out.  The rash of pink in the image below shows us that UUID() causes significant fragmentation because it is causing pages to be split all throughout the table.  This is deemed “expensive” since the ibd file now is more than 2x greater than the UUID()-optimised method, and about 3x greater than a Primary Key with AUTO_INCREMENT.

t3_UUID_A

UUID() Primary Key

Based on this investigation we determined that the true size of the 1 billion row table was about half the size as reported by Linux when examining the .ibd file.  We happened to have an opportunity to dump and load the table (mysqldump | mysql) and found that on restore the table consumed 450GB of disk — so our estimate was pretty good!

I also wanted to highlight that you can determine for yourself the statistics for data / pages split. As you can see below, the first two PK distributions are very tight, with pages packed up to 90%, however the UUID model leaves you with just slightly higher than 50%.  You can run this against your prepared backups if you use Percona XtraBackup since at least version 2.1 by using the –stats option.

[root@mysql]# xtrabackup --stats --datadir=/data/backups/mysql --target-dir=/data/backups/mysql | grep -A5 test | grep -A5 PRIMARY
  table: test/t1, index: PRIMARY, space id: 13, root page: 3, zip size: 0
  estimated statistics in dictionary:
    key vals: 8, leaf pages: 99, size pages: 161
  real statistics:
     level 1 pages: pages=1, data=1287 bytes, data/pages=7%
        leaf pages: recs=60881, pages=99, data=1461144 bytes, data/pages=90%
--
  table: test/t2_uuid_ordered, index: PRIMARY, space id: 14, root page: 3, zip size: 0
  estimated statistics in dictionary:
    key vals: 8, leaf pages: 147, size pages: 161
  real statistics:
     level 1 pages: pages=1, data=3675 bytes, data/pages=22%
        leaf pages: recs=60882, pages=147, data=2191752 bytes, data/pages=91%
--
  table: test/t3_uuid, index: PRIMARY, space id: 15, root page: 3, zip size: 0
  estimated statistics in dictionary:
    key vals: 8, leaf pages: 399, size pages: 483
  real statistics:
     level 2 pages: pages=1, data=92 bytes, data/pages=0%
     level 1 pages: pages=2, data=18354 bytes, data/pages=56%

Below are the table definitions along with the scripts I used to generate the data for this post.

mysql> show create table t1G
 *************************** 1. row ***************************
 Table: t1
 Create Table: CREATE TABLE `t1` (
 `c1` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `c2` char(1) NOT NULL DEFAULT 'a',
 PRIMARY KEY (`c1`),
 KEY `c2` (`c2`)
 ) ENGINE=InnoDB AUTO_INCREMENT=363876 DEFAULT CHARSET=utf8
 1 row in set (0.00 sec)
mysql> show create table t2_uuid_orderedG
 *************************** 1. row ***************************
 Table: t2_uuid_ordered
 Create Table: CREATE TABLE `t2_uuid_ordered` (
 `pk` binary(16) NOT NULL,
 `c2` char(1) NOT NULL DEFAULT 'a',
 PRIMARY KEY (`pk`),
 KEY `c2` (`c2`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8
 1 row in set (0.00 sec)
mysql> show create table t3_uuidG
 *************************** 1. row ***************************
 Table: t3_uuid
 Create Table: CREATE TABLE `t3_uuid` (
 `pk` char(36) NOT NULL,
 `c2` char(1) NOT NULL DEFAULT 'a',
 PRIMARY KEY (`pk`),
 KEY `c2` (`c2`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8
 1 row in set (0.00 sec)

[root@mysql]# cat make_rows.sh
 #!/bin/bash
while [ 1 ] ; do
 mysql -D test -e "insert into t1 (c2) values ('d')" ;
 mysql -D test -e "insert into t2_uuid_ordered (pk, c2) values (ordered_uuid(uuid()), 'a')" ;
 mysql -D test -e "insert into t3_uuid (pk, c2) values (uuid(), 'a')" ;
 done

[root@mysql]# cat space_lsn_age_illustrate.sh
#!/bin/bash
cd /var/lib/mysql
 echo "T1 - AUTO_INCREMENT"
 innodb_space -f test/t1.ibd space-lsn-age-illustrate
 echo "T2 - UUID() ORDERED"
 innodb_space -f test/t2_uuid_ordered.ibd space-lsn-age-illustrate
 echo "T3 - UUID()"
 innodb_space -f test/t3_uuid.ibd space-lsn-age-illustrate

I hope that this post helps you to better understand the impact of random vs ordered Primary Key selection! Please share with me your thoughts on this post in the comments, thanks for reading!

Note to those attentive readers seeking more information: I plan to write a follow-up post that deals with these same models but from a performance perspective. In this post I tried to be as specific as possible with regards to the disk consumption and fragmentation concerns – I feel it addressed the first part and allude to this mysterious “fragmentation” beast but only teases at what that could mean for query response time… Just sit tight, I’m hopeful to get a tag-along to this one post-PLMCE!

By the way, come see me speak at the Percona Live MySQL Conference and Expo in Santa Clara, CA the week of April 13th – I’ll be delivering 5 talks and moderating one Keynote Panel. I hope to see you there!  If you are at PLMCE, attend one my talks or stop me in the hallway and say “Hi Michael, I read your post, now where’s my beer?” – and I’ll buy you a cold one of your choice :)

The post Illustrating Primary Key models in InnoDB and their impact on disk usage appeared first on MySQL Performance Blog.

Oct
17
2013
--

InnoDB scalability issues due to tables without primary keys

Each day there is probably work done to improve performance of the InnoDB storage engine and remove bottlenecks and scalability issues. Hence there was another one I wanted to highlight:

Scalability issues due to tables without primary keys

This scalability issue is caused by the usage of tables without primary keys. This issue typically shows itself as contention on the InnoDB dict_sys mutex. Now the dict_sys mutex controls access to the data dictionary. This mutex is used at various places. I will only mention a few of them:

  • During operations such as opening and closing table handles, or
  • When accessing I_S tables, or
  • During undo of a freshly inserted row, or
  • During other data dictionary modification operations such as CREATE TABLE, or
  • Within the “Persistent Stats” subsystem, among other things.

Of course this list is not exhaustive but should give you a good picture of how heavily it is used.

But the thing is when you are mainly debugging contention related to a data dictionary control structure, you start to look off at something that is directly related to data dictionary modifications. You look for execution of CREATE TABLE, DROP TABLE, TRUNCATE TABLE, etc. But what if none of that is actually causing the contention on the dict_sys mutex? Are you aware when generating “row-id” values, for tables without explicit primary keys, or without non-nullable unique keys, dict_sys mutex is acquired. So INSERTs to tables with implicit primary keys is a InnoDB system-wide contention point.

Let’s also take a look at the relevant source code.

Firstly, below is the function that does the row-id allocation which is defined in the file storage/innobase/row/row0ins.cc

3060 /***********************************************************//**
3061 Allocates a row id for row and inits the node->index field. */
3062 UNIV_INLINE
3063 void
3064 row_ins_alloc_row_id_step(
3065 /*======================*/
3066         ins_node_t*     node)   /*!< in: row insert node */ 3067 { 3068         row_id_t        row_id; 3069  3070         ut_ad(node->state == INS_NODE_ALLOC_ROW_ID);
3071
3072         if (dict_index_is_unique(dict_table_get_first_index(node->table))) {
3073
3074                 /* No row id is stored if the clustered index is unique */
3075
3076                 return;
3077         }
3078
3079         /* Fill in row id value to row */
3080
3081         row_id = dict_sys_get_new_row_id();
3082
3083         dict_sys_write_row_id(node->row_id_buf, row_id);
3084 }

Secondly, below is the function that actually generates the row-id which is defined in the file storage/innobase/include/dict0boot.ic

26 /**********************************************************************//**
 27 Returns a new row id.
 28 @return the new id */
 29 UNIV_INLINE
 30 row_id_t
 31 dict_sys_get_new_row_id(void)
 32 /*=========================*/
 33 {
 34         row_id_t        id;
 35
 36         mutex_enter(&(dict_sys->mutex));
 37
 38         id = dict_sys->row_id;
 39
 40         if (0 == (id % DICT_HDR_ROW_ID_WRITE_MARGIN)) {
 41
 42                 dict_hdr_flush_row_id();
 43         }
 44
 45         dict_sys->row_id++;
 46
 47         mutex_exit(&(dict_sys->mutex));
 48
 49         return(id);
 50 }

Finally, I would like to share results of a few benchmarks that I conducted in order to show you how this affects performance.

Benchmarking affects of non-presence of primary keys

First off all, let me share information about the host that was used in the benchmarks. I will also share the MySQL version and InnoDB configuration used.

Hardware

The host was a “hi1.4xlarge” Amazon EC2 instance. The instance comes with 16 vCPUs and 60.5GB of memory. The instance storage consists of 2×1024 SSD-backed storage volumes, and the instance is connected to a 10 Gigabit ethernet network. So the IO performance is very decent. I created a RAID 0 array from the 2 instance storage volumes and created XFS filesystem on the resultant software RAID 0 volume. This configuration would allows us to get the best possible IO performance out of the instance.

MySQL

The MySQL version used was 5.5.34 MySQL Community Server, and the InnoDB configuration looked as follows:

innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 512M
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 42G
innodb-buffer-pool-instances   = 8
innodb-io-capacity             = 10000
innodb_adaptive_hash_index     = 1

I conducted two different types of benchmarks, and both of them were done by using sysbench.

First one involved benchmarking the performance of single-row INSERTs for tables with and without explicit primary keys. That’s what I would be showing first.

Single-row INSERTs

The tables were generated as follows for the benchmark involving tables with primary keys:

sysbench --test=/root/sysbench/sysbench/tests/db/insert.lua --oltp-tables-count=64 --oltp-table-size=1000000 --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test prepare

This resulted in the following table being created:

CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB

While the tables without primary keys were generated as follows:

sysbench --test=/root/sysbench/sysbench/tests/db/insert.lua --oltp-tables-count=64 --oltp-table-size=1000000 --oltp-secondary --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test prepare

This resulted in the tables being created with the following structure:

CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  KEY `xid` (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB

The actual benchmark for the table with primary keys was run as follows:

sysbench --test=/root/sysbench/sysbench/tests/db/insert.lua --oltp-tables-count=64 --oltp-table-size=1000000 --oltp-dist-type=uniform --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test --max-time=300 --num-threads=16 --max-requests=0 --report-interval=1 run

While the actual benchmark for the table without primary keys was run as follows:

sysbench --test=/root/sysbench/sysbench/tests/db/insert.lua --oltp-tables-count=64 --oltp-table-size=1000000 --oltp-secondary --oltp-dist-type=uniform --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test --max-time=300 --num-threads=16 --max-requests=0 --report-interval=1 run

Note that the benchmarks were run with three variations in the number of concurrent threads used by sysbench: 16, 32 and 64.
Below are how the graphs look like for each of these benchmarks.

Writes per second 16 threads
Writes per second 32 threads
Writes per second 64 threads

Some key things to note from the graphs are that the throughput of the INSERTs to the tables without explicit primary keys never goes above 87% of the throughput of the INSERTs to the tables with primary keys defined. Furthermore, as we increase the concurrency downward spikes start appearing. These become more apparent when we move to a concurrency of 64 threads. This is expected, because the contention is supposed to increase as we increase the concurrency of operations that contend on the dict_sys mutex.

Now let’s take a look at how this impacts the bulk load performance.

Bulk Loads

The bulk loads to the tables with primary keys were performed as follows:

sysbench --test=/root/sysbench/sysbench/tests/db/parallel_prepare.lua --oltp-tables-count=64 --oltp-table-size=1000000 --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test --num-threads=16 run

While the bulk loads to the tables without primary keys were performed as follows:

sysbench --test=/root/sysbench/sysbench/tests/db/parallel_prepare.lua --oltp-tables-count=64 --oltp-table-size=1000000 --oltp-secondary --mysql-table-engine=innodb --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-db=test --num-threads=16 run

Note that the benchmarks were again run with three variations in the number of concurrent threads used by sysbench: 16, 32 and 64.
Below is what the picture is portrayed by the graph.

Parallel Bulk Loading of Tables

Here again, you can see how the bulk load time increases as we increase the number of concurrent threads. This against points to the increase in contention on the dict_sys mutex. With 16 threads the bulk load time for tables without primary keys is 107% more than the bulk load time for the tables with primary keys. This increases to 116% with 32 threads and finally 124% with 64 threads.

Conclusion

Tables without primary keys cause a wide range of contention because they rely on acquiring dict_sys mutex to generate row-id values. This mutex is used at critical places within InnoDB. Hence the affect of large amount of INSERTs to tables without primary keys is not only isolated to that table alone but can be seen very widely. There are a number of times I have seen tables without primary keys being used in many different scenarios that include simple INSERTs to these tables as well as multi-row INSERTs as a result of, for example, INSERT … SELECT into a table that is being temporarily created. The advice is always to have primary keys present in your tables. Hopefully I have been able to highlight the true impact non-presence of primary keys can have.

The post InnoDB scalability issues due to tables without primary keys appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com