Apr
11
2023
--

Take This Unique Quiz About Duplicate Indexes In MySQL | pt-duplicate-key-checker

Duplicate Indexes In MySQL

Indexes are crucial for optimizing query execution times in databases, but having an excessive number of indexes, or redundant ones, can negatively impact performance. While pt-duplicate-key-checker is the go-to tool for identifying duplicate or redundant indexes in MySQL, it may not catch all duplicates.

In this blog post, we’ll put ourselves to the test and see if we can identify duplicate and redundant indexes in MySQL. Toward the end, we will identify what the pt-duplicate-key-checker doesn’t.

The unique quiz

Consider the following MySQL table definition. Let’s put our brains to work and note any of the duplicate or redundant indexes (play fair, don’t cheat):

CREATE TABLE `table_with_lot_of_trouble` (
`id` int NOT NULL,
`col1` varchar(1) DEFAULT NULL,
`col2` varchar(2) DEFAULT NULL,
`col3` varchar(3) DEFAULT NULL,
`col4` varchar(4) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`),
UNIQUE KEY `col1` (`col1`,`col2`),
UNIQUE KEY `col2` (`col2`,`col1`),
UNIQUE KEY `col1_2` (`col1`,`col2`),
UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
UNIQUE KEY `col1_4` (`col1`),
UNIQUE KEY `col1_5` (`col1`),
KEY `idx1` (`col1`,`id`),
KEY `idx2` (`col1`,`col2`),
KEY `idx3` (`col2`,`col1`),
KEY `idx4` (`col1`,`col2`,`col3`),
KEY `idx5` (`col1`,`col2`)
) ENGINE=InnoDB;

While you work on noting down the duplicate indexes in that MySQL table, let me also add some descriptions for duplicate and redundant indexes.

Duplicate index

Duplicate indexes occur when two or more indexes have the same set of columns in the same order. These can occur accidentally due to poor database design or through the use of database management tools that automatically create indexes without checking for duplicates.

Redundant index

Redundant indexes occur when two or more indexes have some overlapping columns. While these may not be exact duplicates, they can still negatively impact database performance.

Both duplicate and redundant indexes can waste disk space and slow down write operations. Each additional index requires additional disk space and inserts, so updates and deletes have to update multiple indexes. Additionally, such indexes can make it harder for the query optimizer to choose the most efficient index, as it has more options to consider.

Test results

Now, I believe you have your list of duplicate keys ready. Let us see what our favorite pt-duplicate-key-checker tells us about the indexes of the table, along with the reasons why they are considered duplicate or redundant.

[root@ip-172-31-82-182 ~]# pt-duplicate-key-checker --databases test --tables table_with_lot_of_trouble
# ########################################################################
# test.table_with_lot_of_trouble
# ########################################################################

# Uniqueness of id ignored because PRIMARY is a duplicate constraint
# id is a duplicate of PRIMARY
# Key definitions:
# UNIQUE KEY `id` (`id`),
# PRIMARY KEY (`id`),
# Column types:
# `id` int not null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `id`;

# Uniqueness of col1_4 ignored because col1_5 is a duplicate constraint
# col1_4 is a duplicate of col1_5
# Key definitions:
# UNIQUE KEY `col1_4` (`col1`),
# UNIQUE KEY `col1_5` (`col1`),
# Column types:
# `col1` varchar(1) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `col1_4`;

# idx3 is a duplicate of col2
# Key definitions:
# KEY `idx3` (`col2`,`col1`),
# UNIQUE KEY `col2` (`col2`,`col1`),
# Column types:
# `col2` varchar(2) default null
# `col1` varchar(1) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx3`;

# idx4 is a duplicate of col1_3
# Key definitions:
# KEY `idx4` (`col1`,`col2`,`col3`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx4`;

# Uniqueness of col1 ignored because col1_5 is a stronger constraint
# col1 is a left-prefix of col1_3
# Key definitions:
# UNIQUE KEY `col1` (`col1`,`col2`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `col1`;

# Uniqueness of col1_2 ignored because col1_5 is a stronger constraint
# col1_2 is a left-prefix of col1_3
# Key definitions:
# UNIQUE KEY `col1_2` (`col1`,`col2`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `col1_2`;

# idx2 is a left-prefix of col1_3
# Key definitions:
# KEY `idx2` (`col1`,`col2`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx2`;

# idx5 is a left-prefix of col1_3
# Key definitions:
# KEY `idx5` (`col1`,`col2`)
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx5`;

# Key idx1 ends with a prefix of the clustered index
# Key definitions:
# KEY `idx1` (`col1`,`id`),
# PRIMARY KEY (`id`),
# Column types:
# `col1` varchar(1) default null
# `id` int not null
# To shorten this duplicate clustered index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx1`, ADD INDEX `idx1` (`col1`);

# ########################################################################
# Summary of indexes
# ########################################################################

# Size Duplicate Indexes 145
# Total Duplicate Indexes 9
# Total Indexes 13

The pt-duplicate-key-checker notes nine duplicate indexes. Could you identify all nine of them? If so, surely you’ve good command over the database schema design. But I wouldn’t write a blog to test your compatibility with pt-duplicate-key-checker.

There is one more duplicate key that pt-duplicate-key-checker is missing; could you identify it? If so, I encourage you to apply at Percona and give me an opportunity to work with smarter brains.

The duplicate unique keys

For those who couldn’t identify the duplicate index, the unidentified duplicate keys are… (drum roll)…

UNIQUE KEY (col1, col2)
UNIQUE KEY (col2, col1)

It follows logically that if a tuple {a, b} is unique, then {b, a} will also be unique. Similar to how Peter Parker is to Spiderman and Gangadhar is to Shaktiman, the set {a, b} is equivalent to the set {b, a}.  This causes the unique key to double-enforce the uniqueness check.

Therefore, having an additional duplicate constraint defined on the same set of columns becomes unnecessary regardless of order. This is specifically true for two-column unique keys only. To optimize your database, you should consider dropping the second unique key or converting it to a secondary index if it is required.

Since you cannot go on and read all table definitions, I wrote a query for you to identify duplicate unique indexes:

mysql> SELECT DISTINCT TABLE_SCHEMA, TABLE_NAME, group_concat(INDEX_NAME) duplic8_UK, COLUMN_NAMES FROM 
 (SELECT DISTINCT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME, GROUP_CONCAT(COLUMN_NAME ORDER BY COLUMN_NAME SEPARATOR ',') AS COLUMN_NAMES 
 FROM information_schema.STATISTICS WHERE NON_UNIQUE = 0 AND INDEX_NAME!='PRIMARY' AND INDEX_TYPE = 'BTREE'  
 GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME) X group by TABLE_SCHEMA, TABLE_NAME, COLUMN_NAMES having count(*)> 1;

+--------------+---------------------------------------------+---------------+--------------+
| TABLE_SCHEMA | TABLE_NAME | duplic8_UK | COLUMN_NAMES |
+--------------+---------------------------------------------+---------------+--------------+
| test | table_with_lot_of_trouble | col1_4,col1_5 | col1 |
| test | table_with_lot_of_trouble | col1,col2 | col1,col2 |
+--------------+---------------------------------------------+---------------+--------------+

Also, don’t forget to provide your opinion in the comments section: Should the non-identification issue with pt-duplicate-key-checker be considered a bug report or a feature request?

Conclusion

Percona’s pt-duplicate-key-checker is an amazing tool, but like every other tool, it is not “fool-proof.” While you create your indexes, evaluate them for duplicity.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

May
31
2013
--

The small improvements of MySQL 5.6: Duplicate Index Detection

MySQL 5.6Here at the MySQL Performance Blog, we’ve been discussing the several new features that MySQL 5.6 brought: GTID-based replication, InnoDB Fulltext, Memcached integration, a more complete performance schema, online DDL and several other InnoDB and query optimizer improvements. However, I plan to focus on a series of posts on the small but handy improvements – changes and bug corrections – in MySQL 5.6 that can make our lives easier and have passed almost unnoticed by most (not all) DBAs.

Duplicate Index Detection

I commented about this on my last webinar, but did not have time to analyze it in-depth.  If you try to do something like this in MySQL 5.5, you will succeed without errors or warnings:

mysql> ALTER TABLE test ADD INDEX (col2);
Query OK, 0 rows affected (0.22 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> ALTER TABLE test ADD INDEX (col2);
Query OK, 0 rows affected (0.25 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> SHOW CREATE TABLE test\G
*************************** 1. row ***************************
       Table: test
Create Table: CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `col2` int(11) DEFAULT NULL,
  `col3` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `col2` (`col2`),
  KEY `col2_2` (`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

In previous versions of MySQL, you can create two indexes with the same columns (in the same order) and the server will not complain.

If we execute the same sentences in MySQL 5.6, the second ALTER will also succeed -and the index will be created-, but we will get a warning (note severity, to be exact):

mysql> ALTER TABLE test ADD INDEX (col2);
Query OK, 0 rows affected (0.36 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> ALTER TABLE test ADD INDEX (col2);
Query OK, 0 rows affected, 1 warning (0.56 sec)
Records: 0  Duplicates: 0  Warnings: 1
mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1831
Message: Duplicate index 'col2_2' defined on the table 'test.test'. This is deprecated and will be disallowed in a future release.
1 row in set (0.00 sec)

As the message points correctly, this is a human mistake, as it is a waste of resources that could potentially impact our performance, and should be avoided. By the way, a good practice to avoid this is always naming your keys with a consistent pattern. This new behavior was introduced in 5.6.7 with the closing of this bug (although this was initially accepted as a bug as early as 2005!).

The report explains more in detail what the “will be disallowed in a future release” means. In MySQL 5.7 the checks will be stricter: in the default SQL mode, a duplicate index will throw a warning instead of a note. In strict mode, it will throw an error and the second ALTER will fail, preventing the creation of the duplicate index.

Does it mean that tools like pt-duplicate-key-checker will not be necessary for MySQL 5.6? Let’s have a look at the code implementing this feature. The warning will only be thrown if the index has not been created automatically, it is not a foreign key, and it has the exact column definition in the same order. In other words, it checks for duplicate keys, but not redundant ones. What is the difference? Let’s see an example. If we execute:

mysql> ALTER TABLE test ADD INDEX (col2);
Query OK, 0 rows affected (0.34 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> ALTER TABLE test ADD INDEX (col2, col3);
Query OK, 0 rows affected (0.39 sec)
Records: 0  Duplicates: 0  Warnings: 0

We get no warnings and no errors in 5.6, as the indexes are different. But as you may know, we can use the second index not only for filtering on both columns, but also for filtering by just the first one. Why can’t MySQL enforce this kind of constraints? For many reasons: the first one because it would break 99% of all applications out there that use MySQL, for which we at Percona tend to find redundant indexes. And second, because in some cases, we may need to have what at first seems redundant indexes but effectively they are not -for example, if one of the two indexes was unique or a foreign key.

This is the output of our tool when run on the same table, correctly identifying the redundancy:

$ pt-duplicate-key-checker --tables test.test
# ########################################################################
# test.test
# ########################################################################
# col2 is a left-prefix of col2_2
# Key definitions:
#   KEY `col2` (`col2`),
#   KEY `col2_2` (`col2`,`col3`)
# Column types:
#         `col2` int(11) default null
#         `col3` varchar(200) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`test` DROP INDEX `col2`;
# ########################################################################
# Summary of indexes
# ########################################################################
# Size Duplicate Indexes   5
# Total Duplicate Indexes  1
# Total Indexes            3

Additionally, pt-duplicate-key-checker will detect subtle redundancies that are engine-dependent, like redundant suffixes for secondary keys in InnoDB. As some of this redundancies could be necessary, depending on the query optimizer and the MySQL version, we always recommend to check manually the optimizations proposed by Percona Toolkit. The MySQL server, of course, cannot risk to block directly all cases.

A set of MySQL utilities were introduced by Oracle recently, which includes mysqlindexcheck, similar to pt-duplicate-key-checker, but it does not detect all cases. For example:

mysql> alter table test add index redundant (col2, id);
Query OK, 0 rows affected (1.57 sec)
Records: 0  Duplicates: 0  Warnings: 0
$ mysqlindexcheck --server=user:pass@localhost test.test
# Source on localhost: ... connected.
$ pt-duplicate-key-checker --tables test.test
# ########################################################################
# test.test
# ########################################################################
# Key redundant ends with a prefix of the clustered index
# Key definitions:
#   KEY `redundant` (`col2`,`id`)
#   PRIMARY KEY (`id`),
# Column types:
#         `col2` int(11) default null
#         `id` int(11) not null auto_increment
# To shorten this duplicate clustered index, execute:
ALTER TABLE `test`.`test` DROP INDEX `redundant`, ADD INDEX `redundant` (`col2`);
# ########################################################################
# Summary of indexes
# ########################################################################
# Size Duplicate Indexes   9
# Total Duplicate Indexes  1
# Total Indexes            2

By the way, if you want to get more familiar with this and other particularities of the latest MySQL GA release, have a look at our upcoming sessions for the “Moving to 5.6″ training course in America (Austin, San Jose) and Europe (Manchester, Utrecht).

The post The small improvements of MySQL 5.6: Duplicate Index Detection appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com