Sep
04
2019
--

Simulating InnoDB Secondary Index Corruption

InnoDB Secondary Index Corruption

InnoDB Secondary Index CorruptionWorking as a support engineer here at Percona is amazing, as you get a variety of requests, ranging from the most trivial questions to questions that require source code review and understanding of the internals of InnoDB, for example.

In our world where High Availability is a must, everything is about being pro-active, and when we need to be reactive we must act fast. To do so we need to ensure we have proper monitoring in place that covers all possible failure scenarios. Unfortunately, that is not always possible and we are always improving and learning as we face new types of issues.

A few days ago one of our customers faced an issue where MySQL identified an InnoDB secondary index corruption and marked that table as corrupted instead of crashing the server. Even though one would think that a single table marked as corrupted is not as bad as an entire server crash, this particular table was key for the whole application, which in the end caused an outage anyway. They wanted to artificially simulate this situation in order to improve their QA / Monitoring.

The Error

2019-08-26T15:47:10.526458Z 2 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace test/tb1 page [page id: space=48, page number=8]. You may have to recover from a backup.
2019-08-26T15:47:10.526465Z 2 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex dc67f35500000008ffffffff000000090000000000 ... 
InnoDB: End of page dump
2019-08-26T15:47:10.594440Z 2 [Note] InnoDB: Uncompressed page, stored checksum in field1 3697800021, calculated checksums for field1: crc32 3697800020/786197600, innodb 812191569, none 3735928559, stored checksum in field2 3697800020, calculated checksums for field2: crc32 3697800020/786197600, innodb 1337770384, none 3735928559,  page LSN 0 3082690, low 4 bytes of LSN at page end 3082690, page number (if stored to page already) 8, space id (if created with >= MySQL-4.1.1 and stored already) 48
InnoDB: Page may be an index page where index id is 49
2019-08-26T15:47:10.594466Z 2 [Note] InnoDB: Index 49 is `b` in table `test`.`tb1`
2019-08-26T15:47:10.594474Z 2 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
2019-08-26T15:47:10.594571Z 2 [ERROR] Got error 126 when reading table './test/tb1'

InnoDB will retry the read 100 times, and if it fails on all of the attempts, the server will assert – otherwise the behavior will vary depending on the version you are using. On 5.5 it will consider the table as corrupted while on 5.7 it will mark the table missing on the data dictionary. Either way, from this moment forward, all queries on that particular table will fail.

InnoDB Internals

InnoDB has a defined layout for it’s in disk structures (redo log, .ibd files, …).

If you are interested in learning or poking around, Jeremy Cole has a project to diagram most of the internal InnoDB structures at https://github.com/jeremycole/innodb_diagrams and also a powerful tool to extract data from innodb at https://github.com/jeremycole/innodb_ruby. The intention of this post is to show you how you can do it manually, so you can get more exposure to internals.

To simulate the corruption we will need to read only a small part of each page header. Here is the list and it’s location as per 5.7 source code:

  • Each page is defined by a type – FIL_PAGE_TYPE
  • An index page has a value of 45bf (or 17855 in decimal) – FIL_PAGE_INDEX
  • Checksum is stored from byte/offset 0 to 4 of each page – FIL_PAGE_SPACE_OR_CHKSUM
  • Index ID is stored at byte/offset 28 of each page data – PAGE_INDEX_ID
#define FIL_PAGE_TYPE 24 /*!< file page type: FIL_PAGE_INDEX,..., 
                           2 bytes. */

#define FIL_PAGE_INDEX 17855 /*!< B-tree node */ 

#define FIL_PAGE_SPACE_OR_CHKSUM 0 /*!< in < MySQL-4.0.14 space id the
page belongs to (== 0) but in later
versions the 'new' checksum of the
page */ 

/**************************************************************//**
Gets the index id field of a page.
@return index id */
UNIV_INLINE
index_id_t
btr_page_get_index_id(
/*==================*/
const page_t* page) /*!< in: index page */
{
return(mach_read_from_8(page + PAGE_HEADER + PAGE_INDEX_ID));
} 

#define PAGE_HEADER FSEG_PAGE_DATA /* index page header starts at this

#define FSEG_PAGE_DATA FIL_PAGE_DATA

#define FIL_PAGE_DATA 38U /*!< start of the data on the page */

#define PAGE_INDEX_ID 28 /* index id where the page belongs.
This field should not be written to after
page creation. */

Source code:
storage/innobase/include/fil0fil.h#L507
storage/innobase/include/fil0fil.h#L575 
storage/innobase/include/fil0fil.h#L483
storage/innobase/include/btr0btr.ic#L109
storage/innobase/include/page0page.h#L56
storage/innobase/include/fsp0types.h#L68
storage/innobase/include/fil0fil.h#L560
storage/innobase/include/page0page.h#L82

To extract the above information from an .ibd file I will use a small bash script to read each page from the file and export the information we will need:

#!/bin/bash
FILE=$1

echo "Reading ${FILE}"
PAGE_SIZE=16384
SIZE=$(ls -l ${FILE} | awk '{print $5}')
PAGES=$((SIZE / PAGE_SIZE))
echo "FILE HAS ${PAGES} pages"

for ((page=1; page<=PAGES; page++))
do
  OFFSET=$((page * PAGE_SIZE))
  PAGETYPE_OFFSET=$((OFFSET + 24))
  PAGETYPE=$(xxd -p -l2 -s ${PAGETYPE_OFFSET} ${FILE})
  PAGECHECKSUM_OFFSET=$((OFFSET))
  PAGECHECKSUM=$(xxd -p -l4 -s ${PAGECHECKSUM_OFFSET} ${FILE})
  INDEXID_OFFSET=$((OFFSET + 38 + 28))
  INDEXID=$(xxd -p -l8 -s ${INDEXID_OFFSET} ${FILE})
  echo "${page} at offset ${OFFSET} has type ${PAGETYPE} and checksum ${PAGECHECKSUM} - ID ${INDEXID}"
done

The last piece for the puzzle is to translate the Index ID to our secondary index. To do it we will make usage of the information_schema table:

SELECT i.NAME, HEX(i.INDEX_ID) FROM information_schema.INNODB_SYS_INDEXES as i JOIN information_schema.INNODB_SYS_TABLES t USING (TABLE_ID) WHERE t.NAME = 'DB/TABLE';

Reproducible test case

To reproduce the corruption we will use a test table with one primary key, one secondary index, and 1000 rows:

#create table
mysql -u root -psekret -e "CREATE TABLE `tb1` (   `ID` int(11) NOT NULL,   `b` varchar(100) DEFAULT NULL,   PRIMARY KEY (`ID`),   KEY `b` (`b`) ) ENGINE=InnoDB" test

#populate table
for i in $(seq 1 1000); do mysql -psekret -e "INSERT INTO test.tb1 VALUES (${i}, 'marcelo_$i')"; done

At this point, InnoDB has all the table pages already loaded into InnoDB Buffer Pool. We will force it to read the pages from disk. The most efficient way will be by adding innodb_buffer_pool_load_at_startup=OFF to my.cnf and restart MySQL.

We would now like to check the list of  Index ID’s of test/tb1 table:

mysql -psekret -e "SELECT i.NAME, HEX(i.INDEX_ID) FROM information_schema.INNODB_SYS_INDEXES as i JOIN information_schema.INNODB_SYS_TABLES t USING (TABLE_ID) WHERE t.NAME = 'test/tb1';"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------+-----------------+
| NAME    | HEX(i.INDEX_ID) |
+---------+-----------------+
| PRIMARY | 30              |
| b       | 31              |
+---------+-----------------+

We will be looking for pages which have a type 45bf (FIL_PAGE_INDEX) and Index ID 31. It’s time to execute our bash script to read the .ibd file:

./read.sh /var/lib/mysql/test/tb1.ibd
Reading /var/lib/mysql/test/tb1.ibd
FILE HAS 10 pages
1 at offset 16384 has type 0005 and checksum 50312eaf - ID 0000000000000000
2 at offset 32768 has type 0003 and checksum 563aeb4c - ID ffffffff0000ffff
3 at offset 49152 has type 45bf and checksum 30ad20b7 - ID 0000000000000030
4 at offset 65536 has type 45bf and checksum 96977741 - ID 0000000000000031
5 at offset 81920 has type 45bf and checksum 5aa506c9 - ID 0000000000000030
6 at offset 98304 has type 45bf and checksum 20f3fe3f - ID 0000000000000030
7 at offset 114688 has type 45bf and checksum 081fbb8f - ID 0000000000000030
8 at offset 131072 has type 45bf and checksum dc67f354 - ID 0000000000000031
9 at offset 147456 has type 45bf and checksum 370eb81c - ID 0000000000000031
10 at offset 163840 has type  and checksum  - ID

Let’s get page eight as the page we will be manipulating. It starts at offset 131072 of tb1.ibd, has a page type 45bf, the index ID field is 31, and it has a checksum of dc67f354. To manipulate the page, we will be changing the checksum field to dc67f355.

printf '\xdc\x67\xf3\x55' | dd of=/var/lib/mysql/test/tb1.ibd bs=1 seek=131072 count=4 conv=notrunc
4+0 records in
4+0 records out
4 bytes (4 B) copied, 0.000121983 s, 32.8 kB/s

Now, next time we try to read this particular page, MySQL will identify a corruption because the checksum doesn’t match. We don’t want the server to assert, so we will change the checksum back to its original value right after we attempt to read the table. This way MySQL will only mark this table as corrupted and as missing in the data dictionary:

mysql -psekret -e "SELECT * FROM test.tb1" & sleep 0.1 && printf '\xdc\x67\xf3\x54' | dd of=/var/lib/mysql/test/tb1.ibd bs=1 seek=131072 count=4 conv=notrunc

That is it! We have provoked an InnoDB secondary index corruption and now all subsequent queries will fail. Now you can make sure your monitoring tool / QA process covers this type of situations.

If in your version of MySQL it has only marked the table as corrupted, a simple CHECK TABLE will bring it back to a usable state. If your version of MySQL is reporting the table as missing from the data dictionary, you can just restart MySQL to force it to re-read the table pages again, since we have changed the checksum back to its original value, it won’t report any error at this time.

Never try to change any MySQL file manually on a production environment. This can cause unrecoverable corruption and data loss.

Aug
02
2013
--

Advanced MySQL Query Tuning: Webinar followup Q&A

Thanks to all who attended my “MySQL Query Tuning” webinar on July 24.  If you missed it, you can you can download the slides and also watch the recorded video. Thank you for the excellent questions after the webinar as well. Query tuning is a big topic and, due to the limited time, I had to skip some material, especially some of the monitoring. I would like, however, to answer all the questions I did not get into during the webinar session.

Q: Did you reset the query cache before doing your benchmark on your query? 0.00 seconds sounds too good 

A: (This is in response to a couple of slides where the time showed as 0.00). Yes, MySQL was running with query cache disabled. The 0.00 just means that the query was executed in less than 0.004 sec. MySQL does not show the higher precision if you run the query from mysql monitor. There are a couple of ways to get the exact query times:

  • MySQL 5.0 +: Use “profiling” feature: http://dev.mysql.com/doc/refman/5.5/en/show-profile.html
  • MySQL 5.1 +: Enable the slow query log with microsecond presision and log the query. To log all queries in the slow query log you can temporary set:  long_query_time = 0
  • MySQL 5.6: Use the new performance_schema counters

Here is the profile for an example query, the query shows 0.00 seconds:

mysql> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000064 |
| checking permissions | 0.000003 |
| checking permissions | 0.000006 |
| Opening tables | 0.000019 |
| System lock | 0.000011 |
| init | 0.000031 |
| optimizing | 0.000011 |
| statistics | 0.000014 |
| preparing | 0.000011 |
| executing | 0.000002 |
| Sending data | 0.002161 |
| end | 0.000004 |
| query end | 0.000002 |
| closing tables | 0.000007 |
| freeing items | 0.000012 |
| logging slow query | 0.000001 |
| cleaning up | 0.000002 |
+----------------------+----------+

As we can see, sending data is actually 0.002 seconds.

Q: Do you ever see doing a seminar that shows how to leverage parallelization (openCL or CUDA) with databases and the performance differences?

A:  MySQL does not support it right now. Usually openCL / CUDA does not help with the disk-bounded applications like databases. However, some projects in OLAP space can actually utilize openCL/CUDA, for example, Alenka, is a column store that is massively parallel. Scanning, aggregation, sorting, etc are done in a data flow manner via the CUDA processing.

 Q: Is this possible to use this /covered index for order by – A.R/ with join? For example if we want to use where on table A and sort it by column from table B

A: Unfortunately, MySQL does not support that with the covered index.  MySQL will only use the filter on the where condition (to limit the number of rows) + filesort. However, if we have a limit clause, MySQL may be able to use the index for order by and stop after finding N rows, matching the condition. It may not be faster thou (as I showed during the webinar) and you may have to use index hints to tell mysql to use the exact index (may not be the best approach as in some cases the use of this index may not be the best for this case). Example:

mysql> explain select * from City ct join Country cn on (ct.CountryCode = cn.Code) where Continent = 'North America' order by ct.population desc limit 10\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ct
type: index
possible_keys: NULL
key: Population
key_len: 4
ref: NULL
rows: 10
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: cn
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: world.ct.CountryCode
rows: 1
Extra: Using where

As we can see, MySQL will use index and avoid “order by”.

Q: Why are Hash Indexes not available for InnoDB engine ? Any plans to bring Hash indexes.

A: InnoDB use Hash Indexes for so called “Adaptive Hash Index” feature.  InnoDB does not  support hash indexes as a normal table index. We are not aware of the Oracle’s InnoDB team plans to bring this feature in.

Please note: MySQL will allow you to use “using hash” keyword when creating an index on InnoDB table. However, it will create a b-tree index instead.

Q: Will foreign key constraints slow down my queries?

A: It may slow down the queries, as InnoDB will have to

  1. Check the foreign key constraint table
  2. Place a shared lock on the row it will read: 

If a FOREIGN KEY constraint is defined on a table, any insert, update, or delete that requires the constraint condition to be checked sets shared record-level locks on the records that it looks at to check the constraint.InnoDB also sets these locks in the case where the constraint fails. (http://dev.mysql.com/doc/refman/5.5/en/innodb-locks-set.html)

Q: How does use of index vary with the number of columns selected in a select query?

If we are talking about the covered index: if we select a column which is not a part of covered index, mysql will not be able to satisfy the query with index only (“using index” in the explain plan). It may be slower, especially if MySQL will have to select large columns and the data is not cached.

In addition, if we select a text or blob column and MySQL will need to create a temporary table, this temporary table will be created ondisk. I’ve described this scenario during the webinar.

The post Advanced MySQL Query Tuning: Webinar followup Q&A appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com