May
09
2011
--

Upcoming webinar on Percona XtraBackup

On 10th May at 9 AM PST I will be giving a webinar about Percona XtraBackup. If you can not attend this time, recorded session will be available soon after webinar.

We always want to store our data in safe way, keep it consistent and be prepared for any kinds of disaster. Every DBA carries this recurring task. This webinar will discuss how to make the process of creating consistent physical backups much easier and faster with Percona XtraBackup utility.

We will discuss:
* XtraBackup operational principles
* Sample usage scenarios
* Full backups
* Incremental and partial backups
* Importing and exporting individual tables
* Streaming and parallel backups
* Limitations and common issues.
You may register  here.

Dec
25
2010
--

Spreading .ibd files across multiple disks; the optimization that isn’t

Inspired by Baron’s earlier post, here is one I hear quite frequently -

“If you enable innodb_file_per_table, each table is it’s own .ibd file.  You can then relocate the heavy hit tables to a different location and create symlinks to the original location.”

There are a few things wrong with this advice:

  1. InnoDB does not support these symlinks.  If you run an ALTER TABLE command, what you will find is that a new temporary table is created (in the original location!), the symlink is destroyed, and the temporary table is renamed.  Your “optimization” is lost.
  2. Striping (with RAID) is usually a far better optimization.  Striping a table across multiple disks effectively balances the  ‘heavy hit’ access across many more disks.  With 1 disk/table you are more likely to have the unbalance one disk overloaded, and many idle.
  3. You restrict your backup methods.  You can’t LVM snapshot across logical volumes.

Another common claim with this recommendation is that it allows you to quickly add space when running out.  LVM actually allows you to add physical volumes, and increase the size of logical volumes ;)   This is much easier to do than more one large table around.


Entry posted by Morgan Tocker |
22 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Dec
13
2010
--

Percona XtraBackup 1.5-Beta

Percona XtraBackup 1.5-Beta is now available for download.

This release adds additional functionality to Percona XtraBackup 1.4, the current general availability version of XtraBackup.

This is a beta release.

Functionality Added or Changed

  • Support for MySQL 5.5 databases has been implemented. (Yasufumi Kinoshita)
  • XtraBackup can now be built from the MySQL 5.1.52, MySQL 5.5.7, or Percona Server 5.1.53-12 code bases (fixes bug #683507). (Alexey Kopytov)
  • The program is now distributed as three separate binaries: 
    • xtrabackup – for use with Percona Server with the built-in InnoDB plugin
    • xtrabackup_51 – for use with MySQL 5.0 & 5.1 with built-in InnoDB
    • xtrabackup_55 – for use with MySQL 5.5 (this binary is not provided for the FreeBSD platform)
  • Backing up only specific tables can now be done by specifying them in a file, using the --tables-file option. (Yasufumi Kinoyasu & Daniel Nichter)
  • Additional checks were added to monitor the rate the log file is being overwritten, to determine if XtraBackup is keeping up. If the log file is being overwritten faster than XtraBackup can keep up, a warning is given that the backup may be inconsistent. (Yasufumi Kinoyasu)
  • The XtraBackup binaries are now compiled with the -O3 gcc option, which may improve backup speed in stream mode in some cases.
  • It is now possible to copy multiple data files concurrently in parallel threads when creating a backup, using the --parallel option. See the xtrabackup Option Reference and Parallel Backups. (Alexey Kopytov)

Bugs Fixed

  • Bug #683507 – xtrabackup has been updated to build from the MySQL 5.1.52, MySQL 5.5.7, or Percona Server 5.1.53-12 code bases. (Alexey Kopytov)

Release Notes for this and previous releases of Percona Xtrabackup can be found in our Wiki.

The latest downloads are available on our website. The latest source code can be found on Launchpad.

Please report any bugs found at Bugs in Percona XtraBackup.

For general questions, use our Percona Discussions Group, and for development questions our Percona Development Group.

For support, commercial, and sponsorship inquiries, contact Percona.


Entry posted by Fred Linhoss |
4 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Nov
22
2010
--

Percona XtraBackup 1.4

Percona XtraBackup 1.4 is now available for download.

Version 1.4 fixes problems related to incremental backups. If you do incremental backups, it’s strongly recommended that you upgrade to this release.

Functionality Added or Changed

  • Incremental backups have changed and now allow the restoration of full backups containing certain rollback transactions that previously caused problems. Please see Preparing the Backups and the --apply-log-only option. (From innobackupex, the --redo-only option should be used). (Yasufumi Kinoshita)
  • The XtraBackup Test Suite was implemented and is now a standard part of each distribution. (Aleksandr Kuzminsky)
  • Other New Features:
    • The --prepare option now reports xtrabackup_binlog_pos_innodb if the information exists. (Yasufumi Kinoshita)
    • When --prepare is used to restore a partical backup, the data dictionary is now cleaned and contains only tables that exist in the backup. (Yasufumi Kinoshita))
    • The --table option was extended to accept several regular expression arguments, separated by commas. (Yasufumi Kinoshita)
  • Other Changes:
    • Ported to the Percona Server 5.1.47-11 code base. (Yasufumi Kinoshita)
    • XtraBackup now uses the memory allocators of the host operating system, rather than the built-in InnoDB allocators (see Using Operating System Memory Allocators). (Yasufumi Kinoshita)

Bugs Fixed

  • Bug #595770 – Binaries are stripped by rpmbuild, so __os_install_post is redefined to change the default behaviour. (Aleksandr Kuzminsky)
  • Bug #589639 – Fixed a problem of hanging when tablespaces were deleted during the recovery process. (Yasufumi Kinoshita)
  • Bug #611960 – Fixed a segmentation fault in “xtrabackup”. (Yasufumi Kinoshita)
  • Miscellaneous important fixes related to incremental backups.

Release Notes for this and previous releases of Percona Xtrabackup can be found in our Wiki.

The latest downloads are available on our website. The latest source code can be found on Launchpad.

Please report any bugs found at Bugs in Percona XtraBackup.

For general questions, use our Percona Discussions Group, and for development questions our Percona Development Group.

For support, commercial, and sponsorship inquiries, contact Percona.


Entry posted by Fred Linhoss |
5 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Nov
09
2010
--

Lost innodb tables, xfs and binary grep

Before I start a story about the data recovery case I worked on yesterday, here’s a quick tip – having a database backup does not mean you can restore from it. Always verify your backup can be used to restore the database! If not automatically, do this manually, at least once a month. No, seriously – in most of the recovery cases I worked on, customers did have some sort of backup, but it just wasn’t working, complete and what not. Someone set it up and never bothered to check if it still works after a while.

Anyway, this post is not really about the backups but rather about few interesting things I learned during last recovery case.

First, some facts about the system and how data was lost:

  • MySQL had a dedicated partition on XFS file system
  • Server was running innodb_file_per_table
  • There was a production master and two slaves, all had same setting
  • Developer accidentally ran DROP DATABASE X on the wrong machine (production master)
  • All slaves followed and dropped their copy of the data
  • The important tables were all InnoDB
  • Having a backup, customer has first attempted to restore from backup on the production master

Luckily (or rather, unfortunately) backup only had table definitions but not the data so no data was written to file system. Mind however that restoring a backup could have been fatal if it would have written some junk data as that would have overwritten the deleted files. Now, here’s what I learned while working on this case:

Recovering from XFS is possible. Just a month ago we had a team meeting in Mallorca where we went through various data loss scenarios. One of them was deleted files on xfs – we all agreed on few things:

  • recovering files from xfs is hard, if at all possible
  • we had no recovery cases on xfs, most likely because:
  • whoever is using xfs, is smart enough to have backups set up properly

Now I’m not picking on the customer or anything – indeed they did have a backup set up, it’s just that some (most important) tables weren’t backed up. We did not try any of the file recovery tools for xfs – apparently they are all targeting specific file types and sure enough InnoDB is not one of the supported files. What we did is we simply ran page_parser on the (already) unmounted file system treating it as a raw device. I was surprised how amazingly simple and fast it was (did you know that latest version of page_parser identifies pages by infimum and supremum records?) – 10G partition was scanned in like 5 minutes and all 4G of innodb pages were successfully written to a separate partition. That’s the easy part though – you run page parser, wait and see what you get.

If InnoDB Data Dictionary was not overwritten by an attempt to restore from the backup, actually second part would’ve been quite easy too, but it was so I could no longer identify correct PK id for specific tables by just mapping data dictionary table records to index records. Instead I had to grep for specific character sequences against all pages. Note however that only works for text in uncompressed text columns (varchar, char, text) but what if tables don’t have any text columns at all? Then, you read further.

GNU grep won’t match binary strings. This isn’t new, I kind of knew grep couldn’t look for binary “junk”, but I really needed it to. Why? Well, here’s few of the scenarios we’ve gone through yesterday:

1. There was this rather big table with integer and enum columns only, where we knew a rather unique PK, well something like 837492636 so we needed a way to find pages that match it. InnoDB would internally store integers in 4-bytes rather than 10 bytes if it were stored as a sequence of characters, so “grep -r 837492636 /dir” would not have worked.

2. There was another table, a small one with 4 smallint columns where all we could match on was a sequence of numbers from a single record – customer knew that there was at least one row with the following sequence: 7, 3, 7, 8. Matching by any of the numbers would be insane as it would match all of the pages while matching on numbers as a sequence of characters would not work for many reasons.

This is where I found bgrep which was exactly the tool for the task. In the case number one, I have just converted number 837492636 to it’s binary representation 0x31EB1F9C and ran “bgrep 31EB1F9C /dir” – there were only like 10 other matches across the 4 gigabytes of pages, some of them probably from the secondary pages, but when you only have that many pages to check it’s really simple.

Second case seemed somewhat complicated, but it really wasn’t – all of the columns were fixed size – 2bytes each, so the thing we had to look for was this sequence: 0007000300070008. I was expecting a lot of mismatches but in fact I ended up with only one match pointing exactly to the right page and so the right index id.

The other thing I would note about bgrep – it was so much faster than matching text using grep, so if you happen to have a lot of data to scan and you have to choose between matching text and number, matching a number using bgrep may work much better.

We are considering shipping bgrep as part of percona recovery toolset, with some additional converters so we can match against various date/time columns as well.


Entry posted by Aurimas Mikalauskas |
10 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Nov
08
2010
--

An argument for not using mysqldump

I have a 5G mysqldump which takes 30 minutes to restore from backup.  That means that when the database reaches 50G, it should take 30×10=5 hours to restore.  Right?  Wrong.

Mysqldump recovery time is not linear.  Bigger tables, or tables with more indexes will always take more time to restore.

If I restore from a raw backup (LVM snapshot, xtrabackup, innodb hot backup), it is very easy to model how much longer recovery time will take:

Backup is 80G
Copy is at 70MB/s.
10G is already complete.
= ((80-10) * 1024)/70/60 = ~17 minutes

I can tell progress with mysqldump by monitoring the rate at which show global status like 'Handler_write'; increases and compare it to my knowledge of about how many rows are in each table.  But progress != a magic number like “17 minutes”.  Not unless I do a lot of complex modeling.

I am not saying a 5 hour recovery is good or bad.  What I am saying is knowing remaining time is very important during disaster recovery.  Being able to say “we’ll be back at 2PM” is much better than saying “we’ll be back between 1PM and 4PM.. maybe”.


Entry posted by Morgan Tocker |
22 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Jul
31
2010
--

Why you can’t rely on a replica for disaster recovery

A couple of weeks ago one of my colleagues and I worked on a data corruption case that reminded me that sometimes people make unsafe assumptions without knowing it. This one involved SAN snapshotting that was unsafe.

In a nutshell, the client used SAN block-level replication to maintain a standby/failover MySQL system, and there was a failover that didn’t work; both the primary and fallback machine had identically corrupted data files. After running fsck on the replica, the InnoDB data files were entirely deleted.

When we arrived on the scene, there was a data directory with an 800+ GB data file, which we determined had been restored from a SAN snapshot. Accessing this file caused a number of errors, including warnings about accessing data outside of the partition boundaries. We were eventually able to coax the filesystem into truncating the data file back to a size that didn’t contain invalid pointers and could be read without errors on the filesystem level. From InnoDB’s point of view, though, it was still completely corrupted. The “InnoDB file” contained blocks of data that were obviously from other files, such as Python exception logs. The SAN snapshot was useless for practical purposes. (The client decided not to try to extract the data from the corrupted file, which we have specialized tools for doing. It’s an intensive process that costs a little money.)

The problem was that the filesystem was ext2, with no journaling and no consistency guarantees. A snapshot on the SAN is just the same as cutting the power to the machine — the block device is in an inconsistent state. A filesystem that can survive that has to ensure that it writes the data to the block device such that it can bring into a consistent state later. The techniques for doing this include things like ordered writes and meta-data journaling. But ext2 does not know how to do that. The data that’s seen by the SAN is some jumble of blocks that represents the most efficient way to transfer the changed blocks over the interconnect, without regard to logical consistency on the filesystem level.

Two things can help avoid such a disaster: 1) get qualified advice and 2) don’t trust the advice; backups and disaster recovery plans must be tested periodically.

This case illustrates an important point that I repeat often. The danger of using a replica as a backup is that data loss on the primary can affect the replica, too. This is true no matter what type of replication is being used. In this case it’s block-level SAN replication. DRBD would behave just the same way. At a higher level, MySQL replication has the same weakness. If you rely on a MySQL slave for a “backup,” you’ll be out of luck when someone accidentally runs DROP TABLE on your master. That statement will promptly replicate over and drop the table off your “backup.”

I still see people using a replica as a backup, and I know it’s just a matter of time before they lose data. In my experience, the types of errors that will propagate through replication are much more common than those that’ll be isolated to just one machine, such as hardware failures.


Entry posted by Baron Schwartz |
24 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Jul
01
2010
--

Recover BLOB fields

For a long time long types like BLOB, TEXT were not supported by Percona InnoDB Recovery Tool. The reason consists in a special way InnoDB stores BLOBs.

An InnoDB table is stored in a clustered index called PRIMARY. It must exist even if a user hasn’t defined the primary index. The PRIMARY index pages are identified by 8-bytes number index_id. The highest 4 bytes are always 0, so index_id is often notated as o:<4 bytes number>, e.g. 0:258. The pages are ordered in a B-tree. Primary index is used as a key. Inside a page records are stored in a linked list.

InnoDB page by default is 16k. Obviously if a record is too long, a single page can’t store it. If the total record size is less than UNIV_PAGE_SIZE/2 – 200 (this is roughly 7k) then the full record is stored in the page of PRIMARY index. Let’s call it internal. In InnoDB sources they have type FIL_PAGE_INDEX*. If the record is longer than 7k bytes, only first 768 bytes of every BLOB field are stored internally. The rest is stored in external pages. They have type FIL_PAGE_TYPE_BLOB. Page type is stored in a FIL_PAGE_TYPE field of the page header . In an earlier post Peter described in details how BLOBs are stored.

Let me illustrate a record format of the example of the table:

CODE:

  1. CREATE TABLE `t1` (
  2. `ID` int(11) unsigned NOT NULL,
  3. `NAME` varchar(120),
  4. `N_FIELDS` int(10),
  5. PRIMARY KEY (`ID`)
  6. ) ENGINE=InnoDB DEFAULT
  7. CHARSET=latin1

Here COMPACT format is used, which is default in MySQL >= 5.1.

The record consists of four parts:

  1. Offsets. Effectively these are field lengths. Only variable length types have offsets. The offset can be one or two bytes depending on maximum field size. The highest bit of the offset is 1 if the field is stored in external pages (i.e. long BLOB field)
  2. NULL fields. A bit per NULL-able field padded to minimum number of bytes to store all flags.
  3. So called extra bytes. These are 5 bytes where different flags are stored like “record is deleted” flag. The last two bytes are the relative pointer to the next record in the page.
  4. User data. TRX_ID is a transaction id. PTR_ID is a pointer in a rollback segment to the old version of the record.

So if a field is the long one, it has 1) The highest bit of the offset is set to “1″, 2) After 768 bytes there are 20 bytes in the end where the following:

  1. BTR_EXTERN_SPACE_ID – space id where the next piece of the field is stored
  2. BTR_EXTERN_PAGE_NO – page id
  3. BTR_EXTERN_OFFSET – offset inside a page. An external page has a header. The similar pointer to the next page is stored in it.
  4. BTR_EXTERN_LEN – length of the next piece.

The external pages are linked until BTR_EXTERN_PAGE_NO is FIL_NULL.

Percona InnoDB Recovery Tool supports now recovery of long fields. It is still in development branch, but should be released after QA tests.

The complexity of BLOB fields brings prerequisites to successfully recover a record with BLOB : all pieces of the BLOB field must be reachable by pointers. That means BTR_EXTERN_PAGE_NO, BTR_EXTERN_OFFSET and BTR_EXTERN_LEN must not be corrupted.

The tool outputs the recovered table in tab-separated values format. BLOBs are printed in a hex form – 0ACD86…

To upload the table back you should utilize UNHEX function:

CODE:

  1. mysql>
  2. LOAD DATA INFILE ‘/path/to/datafile’
  3. REPLACE INTO TABLE <table_name>
  4. FIELDS TERMINATED BY \t
  5. OPTIONALLY ENCLOSED BY ‘"’
  6. LINES STARTING BY ‘<table_name>\t
  7. (id,sessionid,uniqueid,username,nasipaddress,@var1,@var2,etc)
  8. SET
  9.   blobfield = UNHEX(@var1),
  10.   datefield2 = FROM_UNIXTIME(@var2,‘%Y %D %M %h:%i:%s %x’);

* – there is a typo in Recovery of Lost or Corrupted InnoDB Tables Presentation


Entry posted by Aleksandr Kuzminsky |
4 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Apr
24
2010
--

How fast is FLUSH TABLES WITH READ LOCK?

A week or so ago at the MySQL conference, I visited one of the backup vendors in the Expo Hall. I started to chat with them about their MySQL backup product. One of the representatives told me that their backup product uses FLUSH TABLES WITH READ LOCK, which he admitted takes a global lock on the whole database server. However, he proudly told me that it only takes a lock for “a couple of milliseconds.” This is a harmful misconception that many backup vendors seem to hold dear.

The truth is, this command can take a lock for an indeterminate amount of time. It might complete in milliseconds on a test system in a laboratory, but I have seen it take an extremely long time on production systems, measured in many minutes, or potentially even hours. And during this time, the server will get completely blocked (not just read-only!) To understand why, let’s look at what this command actually does. There are several important parts of processing involved in the command.

Requesting the lock

The FLUSH TABLES WITH READ LOCK command immediately requests the global read lock. As soon as this happens, even before the lock is granted to it, all other processes attempting to modify anything in the system are locked out. In theory, this might not seem so bad because after all, the command acquires only a read lock. Other commands that need only a read lock can coexist with this. However, in practice, most tables are both read and written. The first write query to each table will immediately block against the requested global read lock, and subsequent read queries will block against the write query’s requested table lock, so the real effect is that the table is exclusively locked, and all new requests into the system are blocked. Even read queries!

Waiting for the lock

Before the FLUSH TABLES WITH READ LOCK command can successfully acquire the lock, anything else that currently holds the lock must finish what it’s doing. That means that every currently running query, including SELECT queries, must finish. So if there is a long-running query on the system, or an open transaction or another process that holds a table lock, the FLUSH TABLES WITH READ LOCK command itself will block until the other queries finish and all locks are released. This can take a very long time. It is not uncommon for me to log on to a customer’s system and see a query that has been running for minutes or hours. If such a query were to begin running just before the FLUSH TABLES WITH READ LOCK command is issued, the results could be very bad.

Here’s one example of what the system can look like while this process is ongoing:

SQL:

  1. mysql> SHOW processlist;
  2. +—-+——+———–+——+————+——+——————-+———————————————————————-+
  3. | Id | User | Host      | db   | Command    | Time | State             | Info                                                                 |
  4. +—-+——+———–+——+————+——+——————-+———————————————————————-+
  5. 4 | root | localhost | test | Query      |   80 | Sending DATA      | SELECT count(*) FROM t t1 JOIN t t2 JOIN t t3 JOIN t t4 WHERE t1.b=0 |
  6. 5 | root | localhost | test | Query      |   62 | Flushing TABLES   | FLUSH TABLES WITH READ LOCK                                          |
  7. 6 | root | localhost | test | FIELD List |   35 | Waiting FOR TABLE |                                                                      |
  8. 7 | root | localhost | test | Query      |    0 | NULL              | SHOW processlist                                                     |
  9. +—-+——+———–+——+————+——+——————-+———————————————————————-+
  10. 4 rows IN SET (0.00 sec)

Notice that connection 6 can’t even log in because it was a MySQL command-line client that wasn’t started with -A, and it’s trying to get a list of tables and columns in the current database for tab-completion. Note also that “Flushing tables” is a misnomer — connection 5 is not flushing tables yet. It’s waiting to get the lock.

Flushing tables

After the FLUSH TABLES WITH READ LOCK command finally acquires the lock, it must begin flushing data. This does not apply to all storage engines. However, MyISAM does not attempt to flush its own data to the disk during normal processing. It relies on the operating system to flush the data blocks to disk when it decides to. As a result, a system that has a lot of MyISAM data might have a lot of dirty blocks in the operating system buffer cache. This can take a long time to flush. During that time, the entire system is still locked. After all the data is finished, the FLUSH TABLES WITH READ LOCK command completes and sends its response to the client that issued it.

Holding the lock

The final part of this command is the duration during which the lock is held. The lock is released with UNLOCK TABLES or a number of other commands. Most backup systems that use FLUSH TABLES WITH READ LOCK are performing a relatively short operation inside of the lock, such as initiating a filesystem snapshot. So in practice, this often ends up being the shortest portion of the operation.

Conclusion

A backup system that is designed for real production usage must not assume that FLUSH TABLES WITH READ LOCK will complete quickly. In some cases, it is unavoidable. This includes backing up a mixture of MyISAM and InnoDB data. But many installations do not mix their data this way, and should be able to configure a backup system to avoid this global lock. There is no reason to take a lock at all for backing up only InnoDB data. Completely lock-free backups are easy to take. Backup vendors should build this capability into their products.


Entry posted by Baron Schwartz |
27 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com