Jan
04
2023
--

Duplicate, Redundant, and Invisible Indexes

Invisible Indexes MySQL

Invisible Indexes MySQLMySQL index is a data structure used to optimize the performance of database queries at the expense of additional writes and storage space to keep the index data structure up to date. It is used to quickly locate data without having to search every row in a table. Indexes can be created using one or more columns of a table, and each index is given a name. Indexes are especially useful for queries that filter results based on columns with a high number of distinct values.

Indexes are useful for our queries, but duplicate, redundant, and unused indexes reduce performance by confusing the optimizer with query plans, requiring the storage engine to maintain, calculate, and update more index statistics, as well as requiring more disk space.

Since MySQL 8.0, indexes can be marked as invisible. In this blog, I will detail how to detect duplicate and underused indexes as well as the new feature of invisible indexes and how it can help with index management.

How to find the duplicate indexes?

pt-duplicate-key-checker is a command-line tool from Percona Toolkit that scans a MySQL database and identifies tables that have duplicate indexes or primary keys. It can help identify potential problems with the schema and provide guidance on how to fix them. For each duplicate key, the tool prints a DROP INDEX statement by default, allowing you to copy-paste the statement into MySQL to get rid of the duplicate key.

For example:

$ pt-duplicate-key-checker --host=localhost --user=percona --ask-pass
Enter password:
# ########################################################################
# mytestdb.authors_test
# ########################################################################

# idx_first_name is a left-prefix of idx_first_name_last_name
# Key definitions:
#   KEY `idx_first_name` (`first_name`),
#   KEY `idx_first_name_last_name` (`first_name`,`last_name`),
# Column types:
#	  `first_name` varchar(50) default null
#	  `last_name` varchar(50) default null
# To remove this duplicate index, execute:
ALTER TABLE `mytestdb`.`authors_test` DROP INDEX `idx_first_name`;

# Key idx_last_name_id ends with a prefix of the clustered index
# Key definitions:
#   KEY `idx_last_name_id` (`last_name`,`id`)
#   PRIMARY KEY (`id`),
# Column types:
#	  `last_name` varchar(50) default null
#	  `id` int not null auto_increment
# To shorten this duplicate clustered index, execute:
ALTER TABLE `mytestdb`.`authors_test` DROP INDEX `idx_last_name_id`, ADD INDEX `idx_last_name_id` (`last_name`);

# ########################################################################
# Summary of indexes
# ########################################################################

# Size Duplicate Indexes   4125830
# Total Duplicate Indexes  2
# Total Indexes            129

How to find the unused indexes?

Unused indexes are indexes that are created in a database but are not being used in any queries. These indexes take up space in the database and can slow down query performance if they are not maintained. Unused indexes should be identified and removed to improve the performance of the database.

We can use the sys.schema unused indexes view to see which indexes have not been used since the last time the MySQL server was restarted.

Example :

mysql> select * from sys.schema_unused_indexes 
       where index_name not like 'fk_%' and object_schema='mytestdb' and object_name='testtable';
+---------------+-------------+----------------+
| object_schema | object_name | index_name     |
+---------------+-------------+----------------+
| mytestdb      | testtable   | idx_first_name |
| mytestdb      | testtable   | idx_last_name  |
+---------------+-------------+----------------+
2 rows in set (0.00 sec)

Since the index idx_first_name was used to retrieve the records, it is not listed.

mysql> select last_name from testtable WHERE first_name=’Arun’;
+-----------+
| last_name |
+-----------+
| Jith      |
+-----------+
1 rows in set (0.00 sec)

mysql> select * from sys.schema_unused_indexes 
       where index_name not like 'fk_%' and object_schema='mytestdb' and object_name='testtable';
+---------------+-------------+---------------+
| object_schema | object_name | index_name    |
+---------------+-------------+---------------+
| mytestdb      | testtable   | idx_last_name |
+---------------+-------------+---------------+
1 row in set (0.01 sec)

Invisible Indexes

In MySQL 8.0, there is a feature that allows you to have an invisible index. This means that an index is created on a table, but the optimizer does not use it by default. By using this feature, you can test the impacts of removing an index without actually dropping it. If desired, the index can be made visible again, therefore avoiding the time-consuming process of re-adding the index to a larger table.

The SET_VAR(optimizer_switch = ‘use_invisible_indexes=on’) allows the invisible index to be used for specific application activities or modules during a single query while preventing it from being used across the entire application.

Let’s look at a simple example to see how it works.

CREATE TABLE `authors` (
  `id` int NOT NULL AUTO_INCREMENT,
  `first_name` varchar(50),
  `last_name` varchar(50),
  `email` varchar(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
   PRIMARY KEY (`id`),
  UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB;

The indexes are set to visible by default, and we may use the ALTER TABLE table name ALTER INDEX index name to make it INVISIBLE/VISIBLE.

To find the index details, we can use “SHOW INDEXES FROM table;” or query the INFORMATION SCHEMA.STATISTICS.

mysql> SELECT INDEX_NAME, IS_VISIBLE
       FROM INFORMATION_SCHEMA.STATISTICS
       WHERE TABLE_SCHEMA = 'mytestdb' AND TABLE_NAME = 'authors_test';
+------------+------------+
| INDEX_NAME | IS_VISIBLE |
+------------+------------+
| email      | YES        |
| PRIMARY    | YES        |
+------------+------------+
2 rows in set (0.00 sec)

mysql> ALTER TABLE authors_test ALTER INDEX email INVISIBLE;
Query OK, 0 rows affected (0.11 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT INDEX_NAME, IS_VISIBLE
       FROM INFORMATION_SCHEMA.STATISTICS
       WHERE TABLE_SCHEMA = 'mytestdb' AND TABLE_NAME = 'authors_test';
+------------+------------+
| INDEX_NAME | IS_VISIBLE |
+------------+------------+
| email      | NO         |
| PRIMARY    | YES        |
+------------+------------+
2 rows in set (0.00 sec)

The query cannot utilize the index on the email column since it is invisible.

mysql> explain select email from authors_test WHERE email='amanda31@example.org'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: authors_test
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10063
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

The SET VAR optimizer hint can be used to temporarily update the value of an optimizer switch and enable invisible indexes for a single query only, as shown below:

mysql> explain select /*+ SET_VAR(optimizer_switch = 'use_invisible_indexes=on') */ first_name 
               from authors_test WHERE email='amanda31@example.org'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: authors_test
   partitions: NULL
         type: const
possible_keys: email
          key: email
      key_len: 302
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

Please refer to the documentation for more information on invisible indexes.

Summary

Removal of duplicate and unnecessary indexes is suggested to prevent performance degradation in high concurrent workloads with less-than-ideal data distribution on a table. Unwanted keys take up unnecessary disc space, which can cause overhead to DML and read queries. To test the effects of dropping an index without fully removing it, it can be made invisible first.

May
31
2013
--

The small improvements of MySQL 5.6: Duplicate Index Detection

MySQL 5.6Here at the MySQL Performance Blog, we’ve been discussing the several new features that MySQL 5.6 brought: GTID-based replication, InnoDB Fulltext, Memcached integration, a more complete performance schema, online DDL and several other InnoDB and query optimizer improvements. However, I plan to focus on a series of posts on the small but handy improvements – changes and bug corrections – in MySQL 5.6 that can make our lives easier and have passed almost unnoticed by most (not all) DBAs.

Duplicate Index Detection

I commented about this on my last webinar, but did not have time to analyze it in-depth.  If you try to do something like this in MySQL 5.5, you will succeed without errors or warnings:

mysql> ALTER TABLE test ADD INDEX (col2);
Query OK, 0 rows affected (0.22 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> ALTER TABLE test ADD INDEX (col2);
Query OK, 0 rows affected (0.25 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> SHOW CREATE TABLE test\G
*************************** 1. row ***************************
       Table: test
Create Table: CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `col2` int(11) DEFAULT NULL,
  `col3` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `col2` (`col2`),
  KEY `col2_2` (`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

In previous versions of MySQL, you can create two indexes with the same columns (in the same order) and the server will not complain.

If we execute the same sentences in MySQL 5.6, the second ALTER will also succeed -and the index will be created-, but we will get a warning (note severity, to be exact):

mysql> ALTER TABLE test ADD INDEX (col2);
Query OK, 0 rows affected (0.36 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> ALTER TABLE test ADD INDEX (col2);
Query OK, 0 rows affected, 1 warning (0.56 sec)
Records: 0  Duplicates: 0  Warnings: 1
mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1831
Message: Duplicate index 'col2_2' defined on the table 'test.test'. This is deprecated and will be disallowed in a future release.
1 row in set (0.00 sec)

As the message points correctly, this is a human mistake, as it is a waste of resources that could potentially impact our performance, and should be avoided. By the way, a good practice to avoid this is always naming your keys with a consistent pattern. This new behavior was introduced in 5.6.7 with the closing of this bug (although this was initially accepted as a bug as early as 2005!).

The report explains more in detail what the “will be disallowed in a future release” means. In MySQL 5.7 the checks will be stricter: in the default SQL mode, a duplicate index will throw a warning instead of a note. In strict mode, it will throw an error and the second ALTER will fail, preventing the creation of the duplicate index.

Does it mean that tools like pt-duplicate-key-checker will not be necessary for MySQL 5.6? Let’s have a look at the code implementing this feature. The warning will only be thrown if the index has not been created automatically, it is not a foreign key, and it has the exact column definition in the same order. In other words, it checks for duplicate keys, but not redundant ones. What is the difference? Let’s see an example. If we execute:

mysql> ALTER TABLE test ADD INDEX (col2);
Query OK, 0 rows affected (0.34 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> ALTER TABLE test ADD INDEX (col2, col3);
Query OK, 0 rows affected (0.39 sec)
Records: 0  Duplicates: 0  Warnings: 0

We get no warnings and no errors in 5.6, as the indexes are different. But as you may know, we can use the second index not only for filtering on both columns, but also for filtering by just the first one. Why can’t MySQL enforce this kind of constraints? For many reasons: the first one because it would break 99% of all applications out there that use MySQL, for which we at Percona tend to find redundant indexes. And second, because in some cases, we may need to have what at first seems redundant indexes but effectively they are not -for example, if one of the two indexes was unique or a foreign key.

This is the output of our tool when run on the same table, correctly identifying the redundancy:

$ pt-duplicate-key-checker --tables test.test
# ########################################################################
# test.test
# ########################################################################
# col2 is a left-prefix of col2_2
# Key definitions:
#   KEY `col2` (`col2`),
#   KEY `col2_2` (`col2`,`col3`)
# Column types:
#         `col2` int(11) default null
#         `col3` varchar(200) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`test` DROP INDEX `col2`;
# ########################################################################
# Summary of indexes
# ########################################################################
# Size Duplicate Indexes   5
# Total Duplicate Indexes  1
# Total Indexes            3

Additionally, pt-duplicate-key-checker will detect subtle redundancies that are engine-dependent, like redundant suffixes for secondary keys in InnoDB. As some of this redundancies could be necessary, depending on the query optimizer and the MySQL version, we always recommend to check manually the optimizations proposed by Percona Toolkit. The MySQL server, of course, cannot risk to block directly all cases.

A set of MySQL utilities were introduced by Oracle recently, which includes mysqlindexcheck, similar to pt-duplicate-key-checker, but it does not detect all cases. For example:

mysql> alter table test add index redundant (col2, id);
Query OK, 0 rows affected (1.57 sec)
Records: 0  Duplicates: 0  Warnings: 0
$ mysqlindexcheck --server=user:pass@localhost test.test
# Source on localhost: ... connected.
$ pt-duplicate-key-checker --tables test.test
# ########################################################################
# test.test
# ########################################################################
# Key redundant ends with a prefix of the clustered index
# Key definitions:
#   KEY `redundant` (`col2`,`id`)
#   PRIMARY KEY (`id`),
# Column types:
#         `col2` int(11) default null
#         `id` int(11) not null auto_increment
# To shorten this duplicate clustered index, execute:
ALTER TABLE `test`.`test` DROP INDEX `redundant`, ADD INDEX `redundant` (`col2`);
# ########################################################################
# Summary of indexes
# ########################################################################
# Size Duplicate Indexes   9
# Total Duplicate Indexes  1
# Total Indexes            2

By the way, if you want to get more familiar with this and other particularities of the latest MySQL GA release, have a look at our upcoming sessions for the “Moving to 5.6″ training course in America (Austin, San Jose) and Europe (Manchester, Utrecht).

The post The small improvements of MySQL 5.6: Duplicate Index Detection appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com