Apr
11
2023
--

Take This Unique Quiz About Duplicate Indexes In MySQL | pt-duplicate-key-checker

Duplicate Indexes In MySQL

Indexes are crucial for optimizing query execution times in databases, but having an excessive number of indexes, or redundant ones, can negatively impact performance. While pt-duplicate-key-checker is the go-to tool for identifying duplicate or redundant indexes in MySQL, it may not catch all duplicates.

In this blog post, we’ll put ourselves to the test and see if we can identify duplicate and redundant indexes in MySQL. Toward the end, we will identify what the pt-duplicate-key-checker doesn’t.

The unique quiz

Consider the following MySQL table definition. Let’s put our brains to work and note any of the duplicate or redundant indexes (play fair, don’t cheat):

CREATE TABLE `table_with_lot_of_trouble` (
`id` int NOT NULL,
`col1` varchar(1) DEFAULT NULL,
`col2` varchar(2) DEFAULT NULL,
`col3` varchar(3) DEFAULT NULL,
`col4` varchar(4) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`),
UNIQUE KEY `col1` (`col1`,`col2`),
UNIQUE KEY `col2` (`col2`,`col1`),
UNIQUE KEY `col1_2` (`col1`,`col2`),
UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
UNIQUE KEY `col1_4` (`col1`),
UNIQUE KEY `col1_5` (`col1`),
KEY `idx1` (`col1`,`id`),
KEY `idx2` (`col1`,`col2`),
KEY `idx3` (`col2`,`col1`),
KEY `idx4` (`col1`,`col2`,`col3`),
KEY `idx5` (`col1`,`col2`)
) ENGINE=InnoDB;

While you work on noting down the duplicate indexes in that MySQL table, let me also add some descriptions for duplicate and redundant indexes.

Duplicate index

Duplicate indexes occur when two or more indexes have the same set of columns in the same order. These can occur accidentally due to poor database design or through the use of database management tools that automatically create indexes without checking for duplicates.

Redundant index

Redundant indexes occur when two or more indexes have some overlapping columns. While these may not be exact duplicates, they can still negatively impact database performance.

Both duplicate and redundant indexes can waste disk space and slow down write operations. Each additional index requires additional disk space and inserts, so updates and deletes have to update multiple indexes. Additionally, such indexes can make it harder for the query optimizer to choose the most efficient index, as it has more options to consider.

Test results

Now, I believe you have your list of duplicate keys ready. Let us see what our favorite pt-duplicate-key-checker tells us about the indexes of the table, along with the reasons why they are considered duplicate or redundant.

[root@ip-172-31-82-182 ~]# pt-duplicate-key-checker --databases test --tables table_with_lot_of_trouble
# ########################################################################
# test.table_with_lot_of_trouble
# ########################################################################

# Uniqueness of id ignored because PRIMARY is a duplicate constraint
# id is a duplicate of PRIMARY
# Key definitions:
# UNIQUE KEY `id` (`id`),
# PRIMARY KEY (`id`),
# Column types:
# `id` int not null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `id`;

# Uniqueness of col1_4 ignored because col1_5 is a duplicate constraint
# col1_4 is a duplicate of col1_5
# Key definitions:
# UNIQUE KEY `col1_4` (`col1`),
# UNIQUE KEY `col1_5` (`col1`),
# Column types:
# `col1` varchar(1) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `col1_4`;

# idx3 is a duplicate of col2
# Key definitions:
# KEY `idx3` (`col2`,`col1`),
# UNIQUE KEY `col2` (`col2`,`col1`),
# Column types:
# `col2` varchar(2) default null
# `col1` varchar(1) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx3`;

# idx4 is a duplicate of col1_3
# Key definitions:
# KEY `idx4` (`col1`,`col2`,`col3`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx4`;

# Uniqueness of col1 ignored because col1_5 is a stronger constraint
# col1 is a left-prefix of col1_3
# Key definitions:
# UNIQUE KEY `col1` (`col1`,`col2`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `col1`;

# Uniqueness of col1_2 ignored because col1_5 is a stronger constraint
# col1_2 is a left-prefix of col1_3
# Key definitions:
# UNIQUE KEY `col1_2` (`col1`,`col2`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `col1_2`;

# idx2 is a left-prefix of col1_3
# Key definitions:
# KEY `idx2` (`col1`,`col2`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx2`;

# idx5 is a left-prefix of col1_3
# Key definitions:
# KEY `idx5` (`col1`,`col2`)
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx5`;

# Key idx1 ends with a prefix of the clustered index
# Key definitions:
# KEY `idx1` (`col1`,`id`),
# PRIMARY KEY (`id`),
# Column types:
# `col1` varchar(1) default null
# `id` int not null
# To shorten this duplicate clustered index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx1`, ADD INDEX `idx1` (`col1`);

# ########################################################################
# Summary of indexes
# ########################################################################

# Size Duplicate Indexes 145
# Total Duplicate Indexes 9
# Total Indexes 13

The pt-duplicate-key-checker notes nine duplicate indexes. Could you identify all nine of them? If so, surely you’ve good command over the database schema design. But I wouldn’t write a blog to test your compatibility with pt-duplicate-key-checker.

There is one more duplicate key that pt-duplicate-key-checker is missing; could you identify it? If so, I encourage you to apply at Percona and give me an opportunity to work with smarter brains.

The duplicate unique keys

For those who couldn’t identify the duplicate index, the unidentified duplicate keys are… (drum roll)…

UNIQUE KEY (col1, col2)
UNIQUE KEY (col2, col1)

It follows logically that if a tuple {a, b} is unique, then {b, a} will also be unique. Similar to how Peter Parker is to Spiderman and Gangadhar is to Shaktiman, the set {a, b} is equivalent to the set {b, a}.  This causes the unique key to double-enforce the uniqueness check.

Therefore, having an additional duplicate constraint defined on the same set of columns becomes unnecessary regardless of order. This is specifically true for two-column unique keys only. To optimize your database, you should consider dropping the second unique key or converting it to a secondary index if it is required.

Since you cannot go on and read all table definitions, I wrote a query for you to identify duplicate unique indexes:

mysql> SELECT DISTINCT TABLE_SCHEMA, TABLE_NAME, group_concat(INDEX_NAME) duplic8_UK, COLUMN_NAMES FROM 
 (SELECT DISTINCT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME, GROUP_CONCAT(COLUMN_NAME ORDER BY COLUMN_NAME SEPARATOR ',') AS COLUMN_NAMES 
 FROM information_schema.STATISTICS WHERE NON_UNIQUE = 0 AND INDEX_NAME!='PRIMARY' AND INDEX_TYPE = 'BTREE'  
 GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME) X group by TABLE_SCHEMA, TABLE_NAME, COLUMN_NAMES having count(*)> 1;

+--------------+---------------------------------------------+---------------+--------------+
| TABLE_SCHEMA | TABLE_NAME | duplic8_UK | COLUMN_NAMES |
+--------------+---------------------------------------------+---------------+--------------+
| test | table_with_lot_of_trouble | col1_4,col1_5 | col1 |
| test | table_with_lot_of_trouble | col1,col2 | col1,col2 |
+--------------+---------------------------------------------+---------------+--------------+

Also, don’t forget to provide your opinion in the comments section: Should the non-identification issue with pt-duplicate-key-checker be considered a bug report or a feature request?

Conclusion

Percona’s pt-duplicate-key-checker is an amazing tool, but like every other tool, it is not “fool-proof.” While you create your indexes, evaluate them for duplicity.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com