Jul
26
2013
--

Checking B+tree leaf nodes list consistency in InnoDB

If we have InnoDB pages there are two ways to learn how many records they contain:

  • PAGE_N_RECS field in the page header
  • Count records while walking over the list of records from infimum to supremum

In some previous revision of the recovery tool a short summary was added to a dump which is produced by the constraints_parser.

But if a page is lost and page_parser hasn’t found it, all records from this page are lost. In other words per-page recovery statistics gives us little idea about whether or not a recovered table is complete.

To cover this flaw a new tool index_check is introduced in the revision 80

As you might know InnoDB stores a table in a clustered index called PRIMARY.

The PRIMARY index is a B+tree structure. It has a nice feature that all leaf nodes have pointers to a next leaf node, so following the pointers we can read whole index in the primary key order. InnoDB extends the structure and stores also a pointer to the previous node. The previous node of the head is NULL as well as the next node of the tail.

Based on this knowledge we can check if we have all elements of the list. If we can get from any InnoDB page to the beginning and the end of the list then our set of pages is complete.

This is exactly what index_chk does – it reads files from a directory with InnoDB pages (that is produced by page_parser) and tries to walk over the list of pages back and forth.

Let’s review an example. I took some corrupt InnoDB tablespace and split it with a page_parser:

# ./page_parser -f /var/lib/mysql/ibdata1
Opening file: /var/lib/mysql/ibdata1
File information:
ID of device containing file:        64512
inode number:                     27037954
protection:                         100660 (regular file)
number of hard links:                   1
user ID of owner:                      107
group ID of owner:                     116
device ID (if special file):             0
blocksize for filesystem I/O:         4096
number of blocks allocated:          86016
time of last access:            1374662334 Wed Jul 24 06:38:54 2013
time of last modification:      1374233938 Fri Jul 19 07:38:58 2013
time of last status change:     1374233938 Fri Jul 19 07:38:58 2013
total size, in bytes:             44040192 (42.000 MiB)
Size to process:                  44040192 (42.000 MiB)
8.26% done. 2013-07-24 08:39:58 ETA(in 00:00 hours). Processing speed: 3637248 B/sec
...
95.70% done. 2013-07-24 08:40:09 ETA(in 00:00 hours). Processing speed: 4399053 B/sec
#

Now let’s take some index and check if it has all pages:

# ls pages-1374669586/FIL_PAGE_INDEX/0-410/
00000000-00000145.page  00000000-00000235.page  00000000-00000241.page  00000000-00000247.page  00000000-00000254.page
00000000-00000147.page  00000000-00000236.page  00000000-00000243.page  00000000-00000249.page  00000000-00000255.page
00000000-00000148.page  00000000-00000239.page  00000000-00000244.page  00000000-00000251.page
# ./index_chk -f pages-1374669586/FIL_PAGE_INDEX/0-410
Couldn't open file pages-1374669586/FIL_PAGE_INDEX/0-410/00000000-00000140.page
#

Bad news, a page with id 140 is missing!

Indeed the previous page before page#145 is page#140, but it’s missing.

# hexdump -C pages-1374669586/FIL_PAGE_INDEX/0-410/00000000-00000145.page | head -2
00000000  d4 cd 68 41 00 00 00 91  00 00 00 8c ff ff ff ff  |..hA............|
00000010  00 00 00 00 2b 6c ea 90  45 bf 00 00 00 00 00 00  |....+l..E.......|
#

For a cheerful ending let’s check an index that has all pages:

# ls pages-1374669586/FIL_PAGE_INDEX/0-2/* # this is SYS_COLUMNS table
pages-1374669586/FIL_PAGE_INDEX/0-2/00000000-00000010.page  pages-1374669586/FIL_PAGE_INDEX/0-2/00000000-00002253.page
pages-1374669586/FIL_PAGE_INDEX/0-2/00000000-00002244.page
# ./index_chk -f pages-1374669586/FIL_PAGE_INDEX/0-2
OK
#

Thus, a table is fully recovered if and only if two conditions are met:

  • Command grep “Lost records: YES” table.dump | grep -v “Leaf page: NO” outputs nothing
  • ./index_chk -f pages-1374669586/FIL_PAGE_INDEX/<inde_id of=”” my=”” table=””> reports OK

The post Checking B+tree leaf nodes list consistency in InnoDB appeared first on MySQL Performance Blog.

Apr
25
2013
--

Finally. How to verify if all MySQL records were recovered

How to verify if all MySQL records were recoveredAfter nearly every recovery case the same question arises: How many MySQL records were recovered and how many were lost.

Until now there was no way to answer the question without manual investigation. As it turned out a small change can make a big difference.

There are two ways to know how many records an InnoDB page stores. The index page has a header PAGE_N_RECS – this is a number of records the page stores. Obviously it doesn’t count any deleted records. The second method to confirm how many records are in the page is to follow records pointers – and count them.

As you might know, records inside an InnoDB page are organized in an unidirectional list. The list is sorted by primary key and starts with the internal record infinum and ends with another internal record supremum. If you follow record pointers from the infinum and end up with the supremum record then two conclusions can be make:

  1. The page structure is not corrupt
  2. You know how many records to expect

This is what constraints_parser does before it tries to find any records in the page:

./constraints_parser -5f pages-actor/FIL_PAGE_INDEX/0-1599/00000000-00000003.page -V
...
Checking a page
Infimum offset: 0x63
Supremum offset: 0x70
Next record at offset: 0x7F (127)
Next record at offset: 0xA8 (168)
Next record at offset: 0xCE (206)
...
Next record at offset: 0x70 (112)
Page is good
Format: COMPACT, Records list: Valid, Expected records: (200 200)

Here you can see “Expected records”: 200 in the header and 200 records were found by pointers.

Then the contraints_parser tries to recover the records. It checks whether fields have permitted values, etc. In the end it knows how many records were actually recovered.

This statistics are printed along with the recovered records:

-- Page id: 3, Format: COMPACT, Records list: Valid, Expected records: (200 200)
000000001917    DB000001720110  actor   1       "PENELOPE"      "GUINESS"       "2006-02-15 04:34:33"
000000001917    DB00000172011B  actor   2       "NICK"  "WAHLBERG"      "2006-02-15 04:34:33"
000000001917    DB000001720126  actor   3       "ED"    "CHASE" "2006-02-15 04:34:33"
000000001917    DB000001720131  actor   4       "JENNIFER"      "DAVIS" "2006-02-15 04:34:33"
...
000000001917    DB0000017209E4  actor   200     "THORA" "TEMPLE"        "2006-02-15 04:34:33"
-- Page id: 3, Found records: 200, Lost records: NO, Leaf page: YES

From the output above we know that 200 records were expected, 200 – recovered. So, no records were lost.

It’s worth noting that records are stored in the leaf pages only, so if an index has more than one page no records will be found in the root page.

For the root page it will print “Leaf page: NO” and it can be ignored.

The post Finally. How to verify if all MySQL records were recovered appeared first on MySQL Performance Blog.

Apr
22
2013
--

How to recover table structure from InnoDB dictionary

To recover a dropped or corrupt table with Percona Data Recovery Tool for InnoDB you need two things: media with records(ibdata1, *.ibd, disk image, etc.) and a table structure. Indeed, there is no information about the table structure in an InnoDB page. Normally we either recover the structure from .frm files or take it from some old backup.

A new tool sys_parser can recover the table structure from InnoDB dictionary.

Why do we need a new tool anyway? It is absolutely critical to have an accurate table definition to ensure a successful recovery. Even an unnoticeable difference like NULL or NOT NULL can shift all values by a byte and thus will spoil the result. That’s why I prefer the structure from .frm files over taken from backups. But in some cases even .frm files is not an option:

  • Table was dropped and innodb_file_per_table is ON
  • Frm file corrupt, zeroed out, lost or SHOW CREATE TABLE crashes MySQL

There is yet another source of information about the table structure – InnoDB dictionary. Let’s review tables from the dictionary and see what it can give us. We will need four of them:

  • SYS_TABLES
  • SYS_INDEXES
  • SYS_COLUMNS
  • SYS_FIELDS

SYS_TABLES
Here InnoDB keeps correspondence between human readable table names and their internal identifiers.

CREATE TABLE `SYS_TABLES` (
  `NAME` varchar(255) NOT NULL DEFAULT '',
  `ID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `N_COLS` int(10) DEFAULT NULL,
  `TYPE` int(10) unsigned DEFAULT NULL,
  `MIX_ID` bigint(20) unsigned DEFAULT NULL,
  `MIX_LEN` int(10) unsigned DEFAULT NULL,
  `CLUSTER_NAME` varchar(255) DEFAULT NULL,
  `SPACE` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`NAME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

NAME is a human readable table name in form database_name/table_name e.g. sakila/actor. ID is a table identifier. We will need the table id to find indexes of the table.

mysql> select * from SYS_TABLES WHERE NAME='sakila/actor';
+--------------+-----+--------+------+--------+---------+--------------+-------+
| NAME         | ID  | N_COLS | TYPE | MIX_ID | MIX_LEN | CLUSTER_NAME | SPACE |
+--------------+-----+--------+------+--------+---------+--------------+-------+
| sakila/actor | 741 |      4 |    1 |      0 |       0 |              |   738 |
+--------------+-----+--------+------+--------+---------+--------------+-------+

SYS_INDEXES
This table lists all indexes the table has, secondary as well as the primary.

CREATE TABLE `SYS_INDEXES` (
  `TABLE_ID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `ID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `NAME` varchar(120) DEFAULT NULL,
  `N_FIELDS` int(10) unsigned DEFAULT NULL,
  `TYPE` int(10) unsigned DEFAULT NULL,
  `SPACE` int(10) unsigned DEFAULT NULL,
  `PAGE_NO` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`TABLE_ID`,`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

So, TABLE_ID is our table_id. ID here is the index identifier. We need it to find InnoDB pages which belong to the table’s index. Which one? A table can have many secondary indexes, but only in the primary index we can find all fields. It must exist for any InnoDB table. If explicitely defined its NAME is PRIMARY. If the primary key is not defined InnoDB will use a unique secondary index as the primary. If there is no any unique index InnoDB will create one implicitely. Its name will be GEN_CLUST_INDEX.
It doesn’t matter how the primary index gets created it will have minimal ID among the indexes of the table.

mysql> select * from SYS_INDEXES WHERE TABLE_ID=741;
+----------+------+---------------------+----------+------+-------+---------+
| TABLE_ID | ID   | NAME                | N_FIELDS | TYPE | SPACE | PAGE_NO |
+----------+------+---------------------+----------+------+-------+---------+
|      741 | 1679 | PRIMARY             |        1 |    3 |   738 |       3 |
|      741 | 1680 | idx_actor_last_name |        1 |    0 |   738 |       4 |
+----------+------+---------------------+----------+------+-------+---------+

SYS_COLUMNS
Table SYS_COLUMNS stores fields names and type information of the table.

CREATE TABLE `SYS_COLUMNS` (
  `TABLE_ID` bigint(20) unsigned NOT NULL,
  `POS` int(10) unsigned NOT NULL,
  `NAME` varchar(255) DEFAULT NULL,
  `MTYPE` int(10) unsigned DEFAULT NULL,
  `PRTYPE` int(10) unsigned DEFAULT NULL,
  `LEN` int(10) unsigned DEFAULT NULL,
  `PREC` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`TABLE_ID`,`POS`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Here TABLE_ID is a well known table identifier, POS – a position of a field in the table. NAME is the name of a field, MTYPE and PRTYPE store information about the field type, encoding, NULL/NOT NULL properties etc.
LEN is the maximum number of bytes a field uses to store a value. I’m not sure what PREC is used for. It sounds like a short from “precision”, but at least for DECIMAL type where it would make sense it is still zero. If you know how InnoDB uses PREC please let me know.

mysql> select * from SYS_COLUMNS WHERE TABLE_ID=741;
+----------+-----+-------------+-------+---------+------+------+
| TABLE_ID | POS | NAME        | MTYPE | PRTYPE  | LEN  | PREC |
+----------+-----+-------------+-------+---------+------+------+
|      741 |   0 | actor_id    |     6 |    1794 |    2 |    0 |
|      741 |   1 | first_name  |    12 | 2162959 |  135 |    0 |
|      741 |   2 | last_name   |    12 | 2162959 |  135 |    0 |
|      741 |   3 | last_update |     6 |    1799 |    4 |    0 |
+----------+-----+-------------+-------+---------+------+------+

So, we know all fields of the table, we can get the type. Is it enough for the recovery? No.

SYS_FIELDS
We need to know what fields form the primary key. The matter is regardless at what position primary key fields are defined in CREATE TABLE statement internally they always go first in a record. The second issue we should take into account is internal fields DB_TRX_ID and DB_ROLL_PTR . These two fields always reside between the primary key fields and the rest of the fields.
SYS_FIELDS lists fields of all indexes, including the primary.

CREATE TABLE `SYS_FIELDS` (
  `INDEX_ID` bigint(20) unsigned NOT NULL,
  `POS` int(10) unsigned NOT NULL,
  `COL_NAME` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`INDEX_ID`,`POS`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Fields names explain their content.
Index id in our example is 1679:

mysql> SELECT * FROM SYS_FIELDS WHERE INDEX_ID = 1679;
+----------+-----+----------+
| INDEX_ID | POS | COL_NAME |
+----------+-----+----------+
|     1679 |   0 | actor_id |
+----------+-----+----------+

Which means the primary key of the table is one field actor_id.

Now we have all necessary information to generate CREATE TABLE statement.

0. Download the latest revision of the recovery tool

bzr branch lp:percona-data-recovery-tool-for-innodb

1. Compile the dictionary parsers

make dict_parsers

2. Split ibdata1 with page_parser

./page_parser -f /var/lib/mysql/ibdata1

3. Recover SYS_TABLES, SYS_INDEXES, SYS_COLUMNS and SYS_FIELDS from indexes 0-1, 0-3, 0-2 and 0-4 respectively.

./bin/constraints_parser.SYS_FIELDS -4f pages-ibdata1/FIL_PAGE_INDEX/<index_id>

4. Load dumps of the dictionary tables into some MySQL server. Use LOAD DATA INFILE constraints_parser generates

mysql>LOAD DATA INFILE '/path/to/SYS_FIELDS' REPLACE INTO TABLE `SYS_FIELDS` FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES STARTING BY 'SYS_FIELDS\t' (`INDEX_ID`, `POS`, `COL_NAME`);

5. Now everything is ready to generate a CREATE TABLE statement for a table:

./sys_parser -u root sakila/actor
CREATE TABLE `actor`(
        `actor_id` SMALLINT UNSIGNED NOT NULL,
        `first_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,
        `last_name` VARCHAR(45) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL,
        `last_update` TIMESTAMP NOT NULL,
        PRIMARY KEY (`actor_id`)
) ENGINE=InnoDB;

In the end there are two notes:

  • The generated structure differs from the original structure of the table, but it is good enough for the recovery.
  • DECIMAL fields are not fully supported. If a field was of DECIMAL(5,2) type sys_parser will generate DECIMAL(5,0). It has to be corrected manually

The post How to recover table structure from InnoDB dictionary appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com