TokuDB tips: MySQL backups

In my recent post, “TokuDB gotchas: slow INFORMATION_SCHEMA TABLES,” I saw a couple questions and tweets asking if we use TokuDB in production. Actually I mentioned it in that post and we also blogged about it in a couple of other recent posts:

So, yes, we are using Percona Server + TokuDB as a main storage engine in Percona Cloud Tools to store timeseries data.

And, yes, Percona Server + TokuDB is available GA Percona Server 5.6.19-67.0 with TokuDB (GA).

Just having good performance is not enough to make it into production; there are also operational questions and one such question is about backups. I want to explain how we do backups for Percona Server + TokuDB in Percona Cloud Tools.

I should say up front, that we DO NOT have support for TokuDB in Percona XtraBackup. TokuDB internals are significantly different from InnoDB/XtraDB, so it will be a major project to add this to Percona XtraBackup and we do not have any plans at the moment to work on this.

It does not mean that TokuDB users do not have options for backups. There is Tokutek Hot back-up, included in the Tokutek Enterpise Subscription. And there is a method we use in Percona Cloud Tools: LVM Backups. We use mylvmbackup scripts for this task and it works fairly well for us.

There is however some gotchas to be aware. If you understand an LVM backups mechanic, this is basically a managed crash recovery process when you restore from a backup.

Now we need to go in a little detail for TokuDB. To support transactions that involve both TokuDB and InnoDB engines, TokuDB uses a two-phase commit mechanism in MySQL. When involved, the two-phase commit requires binary logs presented for a proper recovery procedures.

But now we need to take a look at how we setup a binary log in Percona Cloud Tools. We used SSD for the main data storage (LVM partition is here) and we use a Hardware RAID1 over two hard-drives for binary logs. We choose this setup as we care about SSD lifetime. In write-intensive workloads, binary logs will produce a lot of write operations and in our calculation we will just burn these SSDs, so we have to store them on something less expensive.

So the problem there is that when we take an LVM snapshot over main storage, we do not have a consistent view of binary logs (although it is possible to modify backup scripts to copy the current binary log under FLUSH TABLES WITH READ LOCK operation, this is probably what we will do next). But binary logs are needed for recovery, without them we face these kind of errors during restoring from backup:

2014-DD-MM 02:15:16 16414 [Note] Found 1 prepared transaction(s) in TokuDB
2014-DD-MM 02:15:16 16414 [ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
2014-DD-MM 02:15:16 16414 [ERROR] Aborting

The error message actually hints a way out. Unfortunately it seems that we are the first ones to have ever tried this option, as tc-heuristic-recover is totally broken in current MySQL and not supposed to work… and it would be noticed if someone really tried it before us (which gives me an impression that Oracle/MySQL never properly tested it, but that is a different story).

We will fix this in Percona Server soon.

So the way to handle a recovery from LVM backup without binary logs is to start mysqld with –tc-heuristic-recover switch (unfortunately I did not figure out yet, should it be COMMIT or ROLLBACK value, hehe).

The proper way to use LVM backup is to have a corresponding binary log file, like I said it will require a modification to mylvmbackup script.

I should say this is not the only way we do backups in Percona Cloud Tools. In this project we use Percona Backup Service provided by the Percona Managed Services team, and our team also uses mydumper to perform a logical backup of data.
While it works acceptably to backup hundreds of gigabytes worth of data (it is just a sequential scan, which should be easy for TokuDB), the full recovery is painful and takes unacceptably long. So mydumper backup (recovery) will be used if we ever need to perform a fine-grained recovery (i.e only small amount of specific tables).

So I hope this tip is useful if you are looking for info about how to do backups for TokuDB.

The post TokuDB tips: MySQL backups appeared first on MySQL Performance Blog.


Considering TokuDB as an engine for timeseries data

TokuDBI am working on a customer’s system where the requirement is to store a lot of timeseries data from different sensors.

For performance reasons we are going to use SSD, and therefore there is a list of requirements for the architecture:

  • Provide high insertion rate
  • Provide a good compression rate to store more data on expensive SSDs
  • Engine should be SSD friendly (less writes per timeperiod to help with SSD wear)
  • Provide a reasonable response time (within ~50 ms) on SELECT queries on hot recently inserted data

Looking on these requirements I actually think that TokuDB might be a good fit for this task.

There are several aspects to consider. This time I want to compare TokuDB vs InnoDB on an initial load time and space consumption.

Let’s assume the schema is following

CREATE TABLE `sensordata` (
  `ts` int(10) unsigned NOT NULL DEFAULT '0',
  `sensor_id` int(10) unsigned NOT NULL,
  `data1` double NOT NULL,
  `data2` double NOT NULL,
  `data3` double NOT NULL,
  `data4` double NOT NULL,
  `data6` double NOT NULL,
  `cnt` int(10) unsigned NOT NULL,
  PRIMARY KEY (`sensor_id`,`ts`)

where sensor_id is in a range from 1 to about 1000 and ts is monotonically increasing timestamp.

This schema exploits both TokuDB and InnoDB clustering primary key, and all inserts are “almost” sequential, which guarantee that all inserts will not require disk access and work with data in memory.
The same for SELECTS – select queries on the most recent time periods will be executed only by a memory access.

I am doing this research on the Dell PowerEdge R420 box with 48GB of memory (40GB for InnoDB buffer pool size, and default memory allocation for TokuDB, which is 24GB for tokudb cache). The storage is a very fast PCI-e Flash card.

The test export CSV file, suitable for LOAD DATA INFILE is 40GB in size and contains over 1 bln records ( exactly)

MySQL Versions:

  • For InnoDB tests I used Percona Server 5.6-RC2
  • For TokuDB tests I used mariadb-5.5.30-tokudb-7.0.4 from Tokutek website

So, first, let’s load data into InnoDB, again, I am using LOAD DATA INFILE statement

  • InnoDB, no compression. Load time is 1 hour 26 min 25.77 sec, final table size is 90GB
  • InnoDB, 8K compression. Load time 3 hours 26 min 17.06 sec, the table size is 45GB
  • InnoDB, 4K compression. Load time 17 hours 23 min 43.48 sec, the table size is 26GB

Now for TokuDB:

  • TokuDB, default compression. Load time 33 min 1.18 sec, the table size on disk is 10GB
  • TokuDB, tokudb_small table format. Load time 37 min 2.34 sec, the table size is 4.6GB

So TokuDB is the obvious leader in both load time and compression. Of course just these are not enough, and now we need to see the performance of further INSERTs and SELECTs queries. This is what I am running right now and will post the results when I have them.

The post Considering TokuDB as an engine for timeseries data appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com