Find unused indexes

I wrote one week ago about how to find duplicate indexes. This time we’ll learn how to find unused indexes to continue improving our schema and the overall performance. There are different possibilites and we’ll explore the two most common here. User Statistics from Percona Server and pt-index-usage.

User Statistics

User Statistics is an improvement on Percona Server that adds some tables to Information Schema with useful information to understand the server activity and identify the source of the load. Those tables store statistics from our users, tables, indexes and so on, information very helpful to have a clear picture of our users are doing in the database.

To accomplish our task of finding unused indexes we’re going to use INFORMATION_SCHEMA.INDEX_STATISTICS but first we should enable User Statistics:

mysql> SET GLOBAL userstat=on;

After all this hard work we have the statistics plugin gathering information. I usually enable it during load peaks to get realistic information about what is happening in the server. To generate some load on the database I’m going to use the benchmark software tpcc-mysql.

After some runs we can start to analyze the information on INDEX_STATISTICS table.

| tpcc         | orders     | PRIMARY      |     59472 |
| tpcc         | customer   | PRIMARY      |    128652 |
| tpcc         | stock      | PRIMARY      |   1128357 |
| tpcc         | district   | PRIMARY      |    116107 |
| tpcc         | item       | PRIMARY      |    282870 |
| tpcc         | warehouse  | PRIMARY      |     84959 |
| tpcc         | orders     | idx_orders   |      2832 |
| tpcc         | customer   | idx_customer |    175240 |
| tpcc         | new_orders | PRIMARY      |     56202 |
| tpcc         | order_line | PRIMARY      |   1164682 |

These are the index used during the benchmark. Let’s compare it with the data on INNODB_INDEX_STATS to get the list of unused indexes:

| stock      | fkey_stock_2      |
| order_line | fkey_order_line_2 |
| history    | fkey_history_1    |
| history    | fkey_history_2    |

Here we see one of the limitations on User Statistics, it can only track direct lookups on index keys but not lookups done during constraint checks. That’s the reason FK are marked as unused so that means all the other indexes has been used during the benchmark.

The second limitation is with partitioned tables, you can’t get index statistics usage from those tables.


The previous example only works with Percona Server so with stock MySQL we should try another approach.

Percona Toolkit has a tool to check the usage of our index from the slow query log. The concept is easy to understand. pt-index-usage reads the slow query log and execute every query with EXPLAIN to ask MySQL which indexes would it use. At the end of the process you’ll have an output with a list of not used indexes. This is an example:

root@debian:/var/log# pt-index-usage slow.log
slow.log:  11% 03:58 remain
slow.log:  21% 03:43 remain
slow.log:  32% 03:09 remain
ALTER TABLE `tpcc`.`order_line` DROP KEY `fkey_order_line_2`; -- type:non-unique
ALTER TABLE `tpcc`.`orders` DROP KEY `idx_orders`; -- type:non-unique
ALTER TABLE `tpcc`.`stock` DROP KEY `fkey_stock_2`; -- type:non-unique

If you compare these results with the previous example, there are some differences and these are some of the reasons:

    • pt-index-usage uses EXPLAIN and User Statistics number of rows read for every index. So we’re comparing an estimation with real data.
    • EXPLAIN, as I said before, is an estimation from the optimizer and sometimes the real query can use a different execution plan.
    • pt-index-usage tries to convert non-SELECT queries to SELECT queries and is not always possible to get a perfect conversion so there can be some differences.
  • Be cautious with Unique indexes, because maybe they are not used for index lookups but your application may need them to avoid duplicates on the column.

    Take in account that depending on the size of the log the server would need lot of process time and cpu power so I recommend you to not use in the production server. Run it on a slave or a testing environment with an similar data estructure and size.


    We’ve learnt how to find unused indexes using different approaches and we’ve also learnt that we should’t follow tool’s suggestion literally. These tools are here to help us and after that is our job to check and benchmark the changes on our testing environment to measure the impact on performance. You won’t need to drop all such indexes but it gives you an idea of which of them you should evaluate. Good tools to help us on that evaluation are pt-log-player and pt-upgrade.

    It’s worth to mention the importance of taking a long enough period of logs or statistics to analyze all relevant workload. In some cases there are processes that run once a day or a week and we should have that information in our analysis report.


    Percona Server 5.5.25-27.0 release delayed

    Due to a bug in upstream MySQL ( http://bugs.mysql.com/bug.php?id=65745 ), we’ve decided to delay the release of the next Percona Server release until it is resolved. We had originally expected to release Percona Server 5.5.25-27.0 today and instead I’m informing you that we are not.

    Quoting Hartmut, who reported the bug:

    An UPDATE that should only modify a single row runs forever and blows up table / tablespace size until there is no more disk space available.

    This isn’t something we want to expose our users to.

    Oracle now lists the latest MySQL 5.5 version on their download site ( http://dev.mysql.com/downloads/mysql/5.5.html ) as 5.5.24, which is the previous release.

    We will keep our 5.5.25 binaries available in our EXPERIMENTAL repository and download site, but will delay our final 5.5.25 release until a resolution to this bug is available.


    MySQL Upgrade Webinar Questions Followup

    I did a Webinar about MySQL Upgrade – Best Practices Yesterday and there were some questions we could not answer during Webinar, following Jay’s Lead I decided to post them as a Blog Post.

    Q: Can you go directly MySQL 5.0 to 5.5 for MyISAM tables?
    MyISAM have not been getting any significant development since MySQL 4.1, so in-place upgrade of MySQL 5.0 to MySQL 5.5 should be rather safe from data standpoint. There are still possibilities for sorting order and data type related changes, so you still need to run mysql_upgrade to check if any tables need to be rebuilt for new versions. Note this only applies to the data part of upgrade, you still need to keep into account possible query changes, replication etc.

    Q: Is it possible to move MyISAM to Innodb through plain ALTER TABLE statements?
    Yes. This is exactly how it is done in many cases. There is really no gain in dumping table via mysqldump and loading it back. If performance is of importance I would recommend using Percona Server with enabled expand_fast_index_creation feature, as I just wrote this option can help creating tables a lot faster. You can also use pt-online-schema-change tool for online migration.

    Q: Should we change ROW replication to MIXED during Upgrade?
    I prefer simple solutions when possible, which is either using STATEMENT or ROW replication. MIXED mode means you will have to deal with STATEMENT and ROW replication issues at the same time. I would only recommend MIXED replication mode if you have something substantial to gain.

    Q: How would Upgrade work if Old Server has Antelope format and New one is Barracuda?
    It should not be the problem. Innodb internal format is rather low level and it rarely impacts application functionality wise. As long a you go over normal upgrade checks you should be fine.

    Q: What are the key concerns upgrading MySQL 4.1 to MySQL 5.5?
    This is jumping over 2 Major versions and many years of development so it is the road less traveled and something not really tested well. One of specific issues might be making MySQL Replication to work. You might need to inject MySQL 5.0 in between so it will convert binary log events, as described here yet it does not guaranty replication will work for any application. There is just too many events. In the worse case you might need to do staged upgrades for your replicated setup or plan some downtime for upgrade.

    Q: Upgrading to MySQL 5.5 do we really need all tables changed to Innodb and character set set to utf8?
    No. MySQL 5.5 continues to support MyISAM tables, so you can continue use them as needed. I would evaluate changing though as MyISAM tables are on track to be depreciated in future MySQL versions also MyISAM tables are essentially at the state they were in MySQL 4.1 while Innodb storage engine got dramatic performance improvements and is getting better in each new release. Also note the default storage engine changed to Innodb in MySQL 5.5, meaning you might need to set storage_engine=MYISAM if you’re relaying on tables to be created as MyISAM by default. Regarding setting character set to utf8 – this is not needed as MySQL 5.5 continues to support various character sets same as previous MySQL versions. Moreover utf8 character set continues to cause significant performance overhead so you should only use it in case you’re storing multi language data.

    Q: What to do if Query Execution Plan changes for the worst during Upgrade?
    This is one of the big challenges with upgrade which can’t be resolved in all cases – sometimes queries just need to be changed or supplemented with hints to get to old execution plans. In many cases though you can get to former execution plan either by updating statistics (ANALYZE TABLE) or disabling some of the optimization/statistics related features in new version. Innodb statistics can especially be the problem as they can be very volatile and sometimes it is hard to get them to be same as in old version.

    Q: What is about running 2 instances of MySQL with different versions to test upgrade?
    This may or may not be good idea. In small environments when we often care more about functionality than performance this can be great way to validate the application is working without significant extra expenses. It also can be used for testing if hardware if there is little extra hardware available. If you’re doing performance testing though you need to ensure only one
    instance is running at the same time so other has full resources (including memory) available. Another thing to mention having multiple instances of different MySQL versions on the same box typically means you will install them from binary distribution instead of packages (most typically used in production) which adds the difference compared to your production use. If resources allow I prefer to have upgrade testing done in the environment which is maximally close to production.

    Q: How safe is it to upgrade from MySQL 5.5 to Percona Server 5.5
    Compatibility is our highest priority with Percona Server, so it should be good as drop in replacement with no data conversion needed. You can also rollback easily unless you have enabled any special Percona Server features which are marked as making database binary incompatible with MySQL. Our focus on compatibility is indeed so high there are number of great features which are disabled by default to make sure you do not start relying on them by accident, so make sure to check out our Documentation to get most our of your Percona Server install.

    Thank you for your questions and for attending. See you at our other MySQL Webinars!


    ALTER TABLE: Creating Index by Sort and Buffer Pool Size

    Today I was looking at the ALTER TABLE performance with fast index creation and without it with different buffer pool sizes. Results are pretty interesting. I used modified Sysbench table for these tests because original table as initially created only has index on column K which initially contains only zeros, which means index is very small and also very fast to build by insertion as insertions happen in the “end” of index tree. I’ve updated column to have bunch of long random strings update sbtest set c=concat(sha1(rand()),’-‘,sha1(rand()),’-‘,sha1(rand()),’-‘,sha1(rand()),’-‘,sha1(rand())); and added key on column C: alter table sbtest add key c(c); The box I’m using for test is rather old box with 8GB of RAM and 4*SATA disks in RAID10. I used 10mil row table which would look as following in terms of data and index size:

    mysql> show table status like "sbtest" \G
    *************************** 1. row ***************************
               Name: sbtest
             Engine: InnoDB
            Version: 10
         Row_format: Compact
               Rows: 10000060
     Avg_row_length: 224
        Data_length: 2247098368
    Max_data_length: 0
       Index_length: 1460322304
          Data_free: 7340032
     Auto_increment: 10000001
        Create_time: 2012-06-27 13:04:56
        Update_time: NULL
         Check_time: NULL
          Collation: latin1_swedish_ci
           Checksum: NULL
    1 row in set (0.00 sec)

    So we have about 3.7GB total size out of which 1.4GB is the Index.

    First Lets look at results of Adding and Dropping column for 128M buffer pool, sized to illustrate the case of index being significantly larger than amount of memory.

    Blue Columns in this case is the default configuration, second is with expand_fast_index_creation=1 which builds table with primary key only when creating a table and when builds all indexes by sort. The speedup we can observe is over 20 times. Note this benefit only applies to the case when you’re adding/dropping/changing columns as if you just add index it will be done by sort even without this additional Percona Server feature.

    Lets now see with 6GB buffer pool which is large enough to fit complete table with index (but is a bit too small to keep 2 copies) In this case we’ll see the following picture:

    Comparing these two graphs we can see few different observations:

    Buffer Pool Size Matters There is 10x difference between the case when index fits in memory to when it does not. The difference can be even larger in the real world. Working with indexing, such as inserting data in random order can become very slow when index does not fit to memory any more.

    Creating Index by sort is great Creating index by sort can provide over 20x performance improvement to the speed of ALTER TABLE, as this is not only step which ALTER TABLE does the improvement to the index creation speed itself has to be even larger. It is also offers better performance for in memory case, even though in this case the gain is just about 2x for this workload. Still substantial.

    Sorting needs OS Cache It is not clearly seen from the graphs but with index built by sort enabled the performance of ALTER TABLE with 128MB was better than with 6GB. It looks like this does not make sense but in fact it does if you think about building index stage. On this box with 8GB of memory allocating 6GB to buffer pool made very little memory available for OS cache, so index build process required more IO than if only 128MB were used for buffer pool. The difference was rather small – less than 10% though, which is because disks are not that bad with sequential IO, which is what file merge process uses. This is important illustration as in many cases when people see performance reduction with large buffer pool they forget what bottleneck might be somewhere else where memory resources might be just needed more.

    Take Away: If you’re having large tables and need to run ALTER TABLE which rebuilds the table or OPTIMIZE TABLE do not forget to enable expand_fast_index_creation it can make a whole process a lot faster.

    P.S The tests were done with Percona Server 5.5.24-26.0


    Announcing Percona Live MySQL London

    Looking for a great opportunity to share your MySQL knowledge with the MySQL community? Here is your chance.

    Percona is happy to announce that the dates of the Percona Live MySQL London Conference have been set for Dec. 3-4, 2012. Call for speakers is underway and will be open until August 17th, 2012. The conference committee will review proposals that fit into the likely conference tracks of:

    • Developing Applications
    • Database Administration
    • Trends in Architecture and Design
    • Tools
    • Utilizing Hardware
    • New Features
    • Best Practices for Businesses
    • Replication
    • High Availability Strategies

    Each session is 50 minutes in length, including a Q&A period.

    The conference will also include tutorial sessions. Tutorials should present immediate and practical applications of in-depth MySQL knowledge targeted at a level between a training class and a conference breakout session. Tutorials are three or six hours in length including a Q&A period.

    Submit a speaker proposal by visiting the Percona Live London 2012 website:

    When I design talks for conferences like Percona Live, my goal is to give the audience some new and practical technical knowledge.  I want them to leave after my talk saying, “I can’t wait to get back to my office and try that.”

    Other types of talks are great, too.  A talk can show the results of some deep benchmarking, or discuss internals that give insight into how to optimize or troubleshoot a given scenario.  A success story can present how a particular project approached a problem, and how they used a particular combination of technology to good effect.  Another can demo some new solution in development, giving attendees a preview of things to come.

    All of these types of talks are welcome at Percona Live, and that’s what makes these events the type where attendees have a hard time deciding which room to be in, and how they can get more of their colleagues to attend.  So join up now, and submit a speaker proposal!


    Announcing Percona XtraBackup 2.0.1

    Percona is glad to announce the release of Percona XtraBackup 2.0.1 on June 25th, 2012. Downloads are available from our download site and Percona Software Repositories.

    This release is the current GA (Generally Available) stable release in the 2.0 series.

    This release contains over 20 fixed bugs, some of the more significant bugs fixed are:

    • After creating a full compressed backup, performing a compressed/uncompressed incremental backup would fail because xtrabackup_checkpoints was compressed. This has been fixed by omitting xtrabackup_checkpoints from compression, so that a full backup could be used for incremental backups without decompression. Bug fixed #977652 (Alexey Kopytov).
    • Streaming backups with –stream=tar would fail if the file size was bigger than 8GB. Fixed by changing the libarchive format from USTAR to restricted PAX which supports bigger file sizes. Bug fixed #977998 (Alexey Kopytov).
    • innobackupex was calling the tar utility unconditionally when streaming ib_lru_dump and xtrabackup_galera_info. Which led to a broken stream when the xbstream format was used. Bug fixed #983720 (Sergei Glushchenko).
    • XtraBackup binary could fail with the “log block checksum mismatch” error when reading the partially written log block. Bug fixed #1015416 (Alexey Kopytov).

    Release notes with all the bugfixes for Percona XtraBackup 2.0.1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.


    Where Percona software goes before it’s released

    Where does software go while it’s waiting for release? The answer is: our experimental repository!

    For anywhere from a few days to a week before we release them, we stage our software releases in the TESTING download area and in separate software repositories for Debian/Ubuntu and CentOS/RHEL. This is so that we can co-ordinate things such as release notes on the web site, blog posts and to ensure that we have a safe testing ground before we unleash them to the world.

    You can always get Percona Server and Percona XtraBackup TESTING binaries from:

    For example, if you went there currently you’d find the upcoming Percona XtraBackup 2.0.1 release due later this week.



    DROP TABLE and stalls: Lazy Drop Table in Percona Server and the new fixes in MySQL

    Suppose you have turned on innodb_file_per_table (which means that each table has its own tablespace), and you have to drop tables in a background every hour or every day. If its once every day then you can probably schedule the table dropping process to run during off-peak hours. But I have seen cases where the tables had to be dropped more frequently (like every other hour), or when there was no such thing as off-peak hours, in such cases you need consistent performance. But dropping tables is known to cause stalls that exhibit themselves as reduced QPS during the table drops or stalls. Percona Server since version 5.1 has a feature know as “Lazy Drop Table” that alleviates the problem to a great extent but does not get rid of it completely. In the new releases of MySQL (versions >= 5.5.23) work has been done on reducing the stalls caused by DROP TABLE, these have been done in the form of fixes for bug 51325 and bug 64284.

    So what is this blog post about? In this blog post I am going to give a high level picture of how Lazy Drop Table works and how the new DROP Table works in MySQL >= 5.5.23. After the explanation I will share the results of benchmarks that I ran to compare both the implementations. Note that whatever we are discussing here holds for when you are using innodb_file_per_table.


    Before going into showing how Lazy Drop Table and the new DROP Table in MySQL >= 5.5.23 work, I would like to mention quickly, that how drop table in MySQL < 5.5.23 used to work. When dropping the table, a mutex on bufferpool would be held for the entire operation and then two scans of the LRU list would be done invalidating pages that belong to the table being dropped. Obviously this is going to take time which depends on the number of pages in the buffer pool at that time, and the longer this takes, the more stalls you have because, importantly, the bufferpool mutex is held for the entire duration. Now let's keep this old behavior of DROP TABLE as a baseline, and let's continue to look at the implementations.

    Lazy Drop Table in Percona Server

    The main function that is responsible for cleaning the bufferpool in the event of drop table is buf_LRU_mark_space_was_deleted(space_id), here space_id is the id of the tablespace corresponding to the table being dropped. Following are the steps performed in this function:

    • Take a mutex on the LRU list of the buffer pool
    • Scan through the LRU list and for each page in the LRU list:
      • If the page belongs to the tablespace being dropped, then set a flag in the page structure marking the page as belonging to a tablespace being dropped
    • Exit the mutex on the LRU list
    • Take a reader-writer shared lock on the latch protecting the Adaptive Hash Index (AHI)
    • Scan the buffer pool and for each page in the buffer pool
    • If the page has a hash index built on it, (meaning the AHI has an entry for this page):
      • Release the shared lock on the latch protecting the AHI
      • Lock the page in exclusive mode
      • Remove all entries in the AHI pointing to the page
      • Release the lock on the page
      • Take the reader-writer shared lock on the latch protecting the AHI again
    • Release the shared lock on the latch protecting the AHI

    Drop Table in Oracle MySQL >= 5.5.23

    The main function that is responsible for cleaning the bufferpool is buf_LRU_flush_or_remove_pages(space_id, BUF_REMOVE_FLUSH_NO_WRITE). Here space_id is the id of the tablespace corresponding to the table being dropped and BUF_REMOVE_FLUSH_NO_WRITE is a flag that is passed to this function meaning that only the entries in the flush_list should be removed. Following are the steps performed in this function:

    • Loop while all dirty pages belonging to the tablespace (being dropped) have been removed from the buffer pool
      • Take the buffer pool mutex
      • Take a mutex on the flush list of the buffer pool
      • Scan the flush list and for each dirty page in the flush list
        • Remove the page from the buffer pool if the page belongs to the tablespace being dropped
        • If we have removed 1024 dirty pages from the buffer pool thus far we release the buffer pool mutex and the flush list mutex for some time
          • Exit the flush list mutex
          • Exit the buffer pool mutex
          • Try to force a context switch by advising the os to give up remainder of the thread’s time slice (this is going to let other threads do things on the buffer pool and prevents the buffer pool mutex from being kept for long)
        • Take the buffer pool mutex and the flush list mutex again
    • Release the flush list mutex
    • Release the buffer pool mutex

    Ok, now that I have shared a high level picture of both the implementations let’s take a look at the most important and major differences. You can see that the most important and the major differences in the two implementations is the use of buffer pool mutex in Oracle MySQL, this mutex is not used by “Lazy Drop Table” implementation. While the MySQL’s version of drop table relies on taking the buffer pool mutex and a mutex on the flush list, “Lazy Drop Table” implementation relies on taking a mutex on the LRU list (which is a list of pages ordered by their last access time) and a shared reader-writer lock on the Adaptive Hash Index latch. The buffer pool mutex is a major point of contention as it is taken at a lot of different places, while the affect of LRU list mutex is cheaper in comparison. However, a lock on the AHI latch can cause contention for writes, since its a shared lock, so while read queries can still access AHI, write queries would have to wait till the latch is unlocked.
    Now about the improvements in Oracle MySQL’s implementation. The good thing is that there is no scan of the LRU list, and pages belonging to the tablespace being dropped are left alone in the LRU list and are evicted by the normal LRU process when these pages age towards the tail of the LRU. The other good thing is that in Oracle MySQL’s implementation the buffer pool mutex and flush list mutex are released after every 1024 pages being evicted. Both these things are an improvement over the previous implementation of DROP TABLE, but do they make enough difference. I feel that the buffer pool mutex and flush list mutex is still going to cause stalls. Well only benchmarks can give us the true picture.


    I have tried to be realistic with the benchmarks. I have not tried to create-and-drop tables as fast as I can in the background, as that is not a realistic benchmark. In real life workload we will not be doing create/drop as fast as possible. Neither would I be looking at QPS of the entire benchmark as a single number, as what most people are often concerned about is uniform performance over a period of time. I would also like to point out one thing about benchmarks – we have been always advising people to look beyond average performance numbers because they almost never really matter in production, it is not a question if average performance is bad but what stalls and pileups you have.

    Ok now some details about the benchmark.
    First of all before running the benchmark, 30 tables were created and populated with 1000 rows, the tables all had a very simple structure:

    CREATE TABLE t_x(i int(11) not null auto_increment primary key)

    And during the benchmark, these 30 tables were then dropped at 30 second intervals in the background, while in the foreground sysbench was run for 15 minutes with 8 threads which would execute read/write queries against a 20M rows table:

    sysbench --test=oltp --oltp-table-size=20000000 --mysql-table-engine=innodb --mysql-socket=/tmp/mysql_sandbox5524.sock --mysql-db=test --oltp-table-name=sbtest_20m --max-time=900 --num-threads=8 --max-requests=0 run

    Note that the tables that were being dropped had been initialized before the sysbench run, so that the table creation time does not skew the results of the benchmark. During the entire run of the benchmark, all the queries were captured in the slow log by setting long_query_time to 0. Then for analysis purposes the queries in the slow log were grouped by their timestamp, giving us a QPS no. for every second of the benchmark run. Finally, average QPS was taken per every five seconds.

    The size of the dataset used is 4.6G, while the buffer pool size is 6G. The value of innodb_max_dirty_pages_pct was set at 90, while the InnoDB log files were sized at 4G. The configuration and dataset sizes were chosen, so that the dataset completely fits in memory, and to prevent checkpointing from skewing the benchmark results. Prior, to running the benchmark, I also made sure that the buffer pool was completely warmed up with the dataset that sysbench will be using entirely in memory. Furthermore I filled up the bufferpool enough to make sure that the server represents one that is in active use. The buffer pool was filled up with 315K pages out of which ~100K were dirty.

    Ok now some details about the hardware. The host has a CPU with two cores, 8G of memory and a software RAID5 array consisting of 4 disks.

    The version of MySQL used for benchmarking pre 5.5.23 DROP TABLE implementation is MySQL 5.5.15, the version of MySQL used for benchmarking the new DROP TABLE implementation is MySQL 5.5.24, and the version of Percona Server used for benchmarking Lazy Drop Table is Percona Server 5.5.24 (innodb_lazy_drop_table=1)

    Now so much for the details of the benchmark, lets see the graphs.

    First of all comes the benchmark results showing the QPS over 5 second periods when “NO” DROP TABLE is running in the background:

    Next comes the benchmark results showing the QPS over 5 second periods with the old DROP TABLE implementation (pre MySQL 5.5.23) and DROP TABLE running in the background:

    Next comes the benchmark results showing the QPS over 5 second periods with the new DROP TABLE implementation (MysQL version >= 5.5.23) and DROP TABLE running in the background:

    Finally comes the benchmark results showing the QPS over 5 second periods with the Lazy Drop Table implementation in Percona Server and DROP TABLE running in the background:

    So we can compare the graphs above, and see that there are dips in QPS whenever table is dropped, MySQL 5.5.23 does show an improvement over MySQL version prior to 5.5.23, as the dips are not as regular and as low as in the old implementation of drop table, but the “Lazy Drop Table” implementation wins here. The dips are there in the case of “Lazy Drop Table” but the dips in QPS are not as low or as pronounced as compared to MySQL.


    The new DROP TABLE related fixes in MySQL >= 5.5.23 have certainly reduced the performance hit, and the performance dips are not as pronounced as in older versions of MySQL. However, Percona Server with “Lazy Drop Table” still fares a lot better than MySQL. The dips are there, but not as low as in MySQL, though there is still room for improvement. I certainly like the idea of not scanning the LRU list at all during the DROP TABLE, and I think this is something that could be implemented in “Lazy Drop Table” to further improve the overall performance during background DROP TABLE operations.


    On Character Sets and Disappearing Tables

    The MySQL manual tells us that regardless of whether or not we use “SET FOREIGN_KEY_CHECKS=0″ before making schema changes, InnoDB will not allow a column referenced by a foreign key constraint to be modified in such a way that the foreign key will reference a column with a mismatched data type. For instance, if we have these two tables:

    CREATE TABLE foo (
      j INT NOT NULL,
    CREATE TABLE bar (
      j INT NOT NULL,

    trying to do something like “ALTER TABLE bar DROP j” or “ALTER TABLE bar MODIFY COLUMN j j SMALLINT NOT NULL” will produce an error unless we first remove the foreign key constraint present in table “foo”. Indeed, if we try it, that’s exactly what happens:

    (root@localhost) [foobar]> ALTER TABLE bar drop j;
    ERROR 1025 (HY000): Error on rename of './foobar/#sql-3c13_11' to './foobar/bar' (errno: 150)

    And that’s exactly what we’d hope would happen, too, lest table “foo” suddenly end up with an unsatisfiable FK reference due to a mismatched data type (non-existent certainly qualifies as mismatched in this case). No harm was done by this operation, which is also what we’d expect: table “foo” and table “bar” remain in their original, unmodified, and unmolested state, and if we really want to make this modification, we can drop the FK constraint and then proceed with the ALTER.  No surprises yet.

    But are there situations in which this operation doesn’t unfold quite so neatly? Obviously, there are, or there’d be little point in this blog post. When is a matching datatype not really a matching datatype? If you guessed that it has something to do with character fields and character sets, then you’d be absolutely correct.  Consider the following 3 tables, each one with a single row inserted into it:

    create table dos1 (
      fkto2   char(32) not null primary key,
      fkto3   char(32) not null,
      fkfrom3 char(32) not null,
      foreign key (fkto3) references dos3(fkfrom1),
      foreign key (fkto2) references dos2(fkfrom1)
    ) engine=innodb character set=utf8;
    create table dos2 (
      fkfrom1 char(32) not null,
      fkto3 char(32) not null primary key,
      index (fkfrom1),
      foreign key (fkto3) references dos3(fkfrom1)
    ) engine=innodb character set=utf8;
    create table dos3 (
      fkfrom1 char(32) not null,
      fkfrom2 char(32) not null,
      fkto1   char(32) not null primary key,
      foreign key (fkto1) references dos1(fkfrom3)
    ) engine=innodb character set=utf8;
    INSERT INTO dos1 VALUES ('value_fk1_to_2', 'value_fk1_to_3', 'value_fk3_to_1');
    INSERT INTO dos2 VALUES ('value_fk1_to_2', 'value_fk2_to_3');
    INSERT INTO dos3 VALUES ('value_fk1_to_1', 'value_fk2_to_3', 'value_fk3_to_1');

    Suppose that we decide that we’ll never have any use for UTF8 in this database at all, and that we want to achieve the best performance possible, so we’re going to convert everything to ASCII. Or perhaps we have some other tables with ASCII identifiers and we’re doing JOINs against tables in this database and seeing that indexes aren’t being used in the way we were expecting. Whatever the reason, we decide to convert these columns to be ASCII.  We might assume that since the underlying data type is remaining as a CHAR(32) that we don’t need to worry about dropping constraints and that we can just modify the table in place.  The first table certainly proceeds well enough:

    (root@localhost) [oops4]> set foreign_key_checks=0;
    Query OK, 0 rows affected (0.00 sec)
    (root@localhost) [oops4]> alter table dos1 change column fkto2 fkto2 char(32) character set ascii not null, change column fkto3 fkto3 char(32) character set ascii not null, change column fkfrom3 fkfrom3 char(32) character set ascii not null;
    Query OK, 1 row affected (0.09 sec)
    Records: 1  Duplicates: 0  Warnings: 0

    But suppose that this was actually a long-running ALTER statement. Maybe we got pre-occupied, or we had to pick up the task the next day, and when we came back, we forgot that we had to re-disable foreign key checking since we were in a new session. We move on to table 2:

    (root@localhost) [oops4]> alter table dos2 change column fkto3 fkto3 char(32) character set ascii not null, change column fkfrom1 fkfrom1 char(32) character set ascii not null;
    ERROR 1025 (HY000): Error on rename of './oops4/#sql-3c13_14' to './oops4/dos2' (errno: 150)

    Oops, we forgot to disable foreign key checking. No problem, right? Not exactly….

    (root@localhost) [oops4]> set foreign_key_checks=0;
    Query OK, 0 rows affected (0.00 sec)
    (root@localhost) [oops4]> alter table dos2 change column fkto3 fkto3 char(32) character set ascii not null, change column fkfrom1 fkfrom1 char(32) character set ascii not null;
    ERROR 1146 (42S02): Table 'oops4.dos2' doesn't exist
    (root@localhost) [oops4]> show tables;
    | Tables_in_oops4 |
    | dos1            |
    | dos3            |
    2 rows in set (0.00 sec)

    “Oops” indeed. Where we once had three tables, there are now only 2.  The “dos2″ table is nowhere to be found.  The good news here is that the data is not completely gone – it still exists in the temporary tables that were created as part of the ALTER operation and can be found in the data directory, but it’s no longer readily accessible by you or your applications, and the database itself is now in an odd state of limbo.  You can’t drop the database, you can’t recreate the “dos2″ table, and you can’t turn the orphaned temporary table back into its former self unless you’re handy with a hex editor and have a good knowledge of InnoDB file formats.  Moreover, trying to modify any of the remaining tables could cause the same thing to happen to them as well.

    (root@localhost) [oops4]> drop database oops4;
    ERROR 1010 (HY000): Error dropping database (can't rmdir './oops4', errno: 39)
    (root@localhost) [oops4]> create table dos2 (i int);
    ERROR 1005 (HY000): Can't create table 'oops4.dos2' (errno: -1)

    Yikes.  Not a good day.  The one thing that you can do, fortunately, is recover the data into a new table, but you’ll need to go take a look inside the InnoDB data directory for this database to get the actual filename of the temporary table rather than the temporary table name that was displayed in the error message.  Once you do that, you can run something like this:

    CREATE TABLE recovery LIKE `#sql2-3c13-14`;
    INSERT INTO recovery SELECT * FROM `#sql2-3c13-14`;

    But your database is still potentially inconsistent, and you can’t explicitly drop those temporary tables or rename them.

    This looks somewhat like closed bug 13378, but I think there’s something of an inconsistency present when MySQL won’t allow us to change an FK-referenced column from INT to SMALLINT, but it will allow us to change from a multi-byte character column to a single-byte character column, so I’ve submitted this as bug report 65701.

    So what did we learn here?

    • Character set mismatches are bad enough when dealing with JOINs, but they can be an absolute nightmare when dealing with FKs.
    • The safest way to modify a column referenced by FKs is to drop the constraints first.
    • Sometimes you won’t know that an operation has failed badly until it’s too late. Just one more reason (as if you needed any more) to keep good backups that are tested regularly.

    Tested under MySQL 5.5.24.


    pt-online-schema-change and default values

    When I’m doing conventional ALTER TABLE in MySQL I can ignore default value and it will be assigned based on the column type. For example this alter table sbtest add column v varchar(100) not null would work even though we do not specify default value. MySQL will assign empty string as default default value for varchar column. This however does not work for pt-online-schema-change:

    root@smt2:~# pt-online-schema-change --execute  --alter="add column v varchar(100) not null" D=sbtest,t=sbtest     Altering `sbtest`.`sbtest`...
    Creating new table...
    Created new table sbtest._sbtest_new OK.
    Altering new table...
    Altered `sbtest`.`_sbtest_new` OK.
    Creating triggers...
    Created triggers OK.
    Copying approximately 10000060 rows...
    Dropping triggers...
    Dropped triggers OK.
    Dropping new table...
    Dropped new table OK.
    `sbtest`.`sbtest` was not altered.
            (in cleanup) Error copying rows from `sbtest`.`sbtest` to `sbtest`.`_sbtest_new`: Copying rows caused a MySQL error 1364:
        Level: Warning
         Code: 1364
      Message: Field 'v' doesn't have a default value
        Query: INSERT LOW_PRIORITY IGNORE INTO `sbtest`.`_sbtest_new` (`id`, `k`, `c`, `pad`) SELECT `id`, `k`, `c`, `pad` FROM `sbtest`.`sbtest` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) /*pt-online-schema-change 11617 copy nibble*/
    Dropping triggers...
    Dropped triggers OK.
    `sbtest`.`sbtest` was not altered.

    pt-online-table-change does not try to guess the default value from column type and so it will fail unless default value is specified. To make pt-online-table-change we need to make a small change to our ALTER TABLE statement to explicitly specify default value:

    root@smt2:~# pt-online-schema-change --execute  --alter="add column v varchar(100) not null default ''" D=sbtest,t=sbtest
    Altering `sbtest`.`sbtest`...
    Creating new table...
    Created new table sbtest._sbtest_new OK.
    Altering new table...
    Altered `sbtest`.`_sbtest_new` OK.
    Creating triggers...
    Created triggers OK.
    Copying approximately 10000060 rows...
    Copying `sbtest`.`sbtest`:   5% 08:48 remain
    Copying `sbtest`.`sbtest`:  10% 08:52 remain
    Copying `sbtest`.`sbtest`:  14% 08:42 remain
    Copying `sbtest`.`sbtest`:  19% 08:17 remain
    Copying `sbtest`.`sbtest`:  24% 07:46 remain
    Copying `sbtest`.`sbtest`:  29% 07:13 remain
    Copying `sbtest`.`sbtest`:  33% 06:48 remain
    Copying `sbtest`.`sbtest`:  38% 06:21 remain
    Copying `sbtest`.`sbtest`:  43% 05:54 remain
    Copying `sbtest`.`sbtest`:  47% 05:25 remain
    Copying `sbtest`.`sbtest`:  52% 04:57 remain
    Copying `sbtest`.`sbtest`:  57% 04:23 remain
    Copying `sbtest`.`sbtest`:  62% 03:50 remain
    Copying `sbtest`.`sbtest`:  67% 03:24 remain
    Copying `sbtest`.`sbtest`:  71% 02:56 remain
    Copying `sbtest`.`sbtest`:  76% 02:29 remain
    Copying `sbtest`.`sbtest`:  80% 01:59 remain
    Copying `sbtest`.`sbtest`:  85% 01:30 remain
    Copying `sbtest`.`sbtest`:  90% 01:01 remain
    Copying `sbtest`.`sbtest`:  95% 00:30 remain
    Copied rows OK.
    Swapping tables...
    Swapped original and new tables OK.
    Dropping old table...
    Dropped old table `sbtest`.`_sbtest_old` OK.
    Dropping triggers...
    Dropped triggers OK.
    Successfully altered `sbtest`.`sbtest`.

    Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com