Upcoming Webinar September 14, 2017: Supercharge Your Analytics with ClickHouse


ClickHouseJoin Percona’s CTO Vadim Tkachenko @VadimTk and Altinity’s Co-Founder, Alexander Zaitsev as they present Supercharge Your Analytics with ClickHouse on Thursday, September 14, 2017, at 10:00 am PDT / 1:00 pm EDT (UTC-7).


ClickHouse is a real-time analytical database system. Even though they’re only celebrating one year as open source software, it has already proved itself ready for serious workloads.

We will talk about ClickHouse in general, some of its internals and why it is so fast. ClickHouse works in conjunction with MySQL – traditionally weak for analytical workloads – and this presentation demonstrates how to make the two systems work together.

There will also be an in-person presentation on How to Build Analytics for 100bn Logs a Month with ClickHouse at the meetup Wednesday, September 13, 2017. RSVP here.

Alexander Zaitsev will also be speaking at Percona Live Europe 2017 on Building Multi-Petabyte Data Warehouses with ClickHouse on Wednesday, September 27 at 11:30 am. Use the promo code “SeeMeSpeakPLE17” for 15% off.

Alexander ZaitsevAlexander Zaitsev
Altinity’s Co-Founder
Alexander is a co-founder of Altinity. He has 20 years of engineering and engineering management experience in several international companies. Alexander is expert in high scale analytics systems design and implementation. He designed and deployed petabyte scale data warehouses, including one of earliest ClickHouse deployments outside of Yandex.

Vadim Tkachenko
Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products.

He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition. Previously, he founded a web development company in his native Ukraine and spent two years in the High Performance Group within the official MySQL support team. Vadim received a BS in Economics and an MS in computer science from the National Technical University of Ukraine. He now lives in California with his wife and two children.


Checkpoint strikes back

In my recent benchmarks for MongoDB, we can see that the two engines WiredTiger and TokuMX struggle from periodical drops in throughput, which is clearly related to a checkpoint interval – and therefore I correspond it to a checkpoint activity.

The funny thing is that I thought we solved checkpointing issues in InnoDB once and for good. There are bunch of posts on this issue in InnoDB, dated some 4 years ago.  We did a lot of research back then working on a fix for Percona Server

But, like Baron said, “History Repeats“… and it seems it also repeats in technical issues.

So, let’s take a look what is checkpointing is, and why it is a problem for storage engines.

As you may know, a transactional engine writes transactional logs (name you may see: “redo logs”, write-ahead logs or WAL etc),
to be able to perform a crash recovery in the case of database crash or server power outage. To maintain somehow the time for recovery (we all expect that database will start quick), the engine has to limit how many changes are in logs. For this a transactional engine performs a “checkpoint,” which basically synchronizes changes in memory with corresponding changes in logs – so old log records can be deleted. Often it results in writing changed database pages from memory to a permanent storage.

InnoDB takes an approach to limit size of log files (in total it equals to innodb-log-file-size * innodb-log-files-in-group), which I name “size limited checkpointing”.
Both TokuDB and WiredTiger limit changes by time periods (by default 60 sec), which results in that log files may grow unlimited within a given time interval (I name it “time limited checkpointing”).

Also the difference is InnoDB takes a “fuzzy checkpointing” approach, which was not really “fuzzy”, until we fixed it, but basically it is not to wait until we reach size limited, but perform “checkpointing” all the time, and the more intensive the closer we get to log size limit. This allows to achieve more or less a smooth throughput, without significant drops in throughput.

Unlike InnoDB, both TokuDB (that I am sure) and WiredTiger (I speculate here, I did not look into WiredTiger internal) wait till the last moment, and perform checkpoint strictly by prescribed interval. If it happens that a database contains many changes in memory, it will result in performance stalls. This effect is close by effect to “hitting a wall on full speed”: user queries get locked, until an engine writes all changes in memory it has to write.

Interestingly enough, RocksDB, because it has a different architecture (I may write on it in future, but for now I will point to RocksDB Wiki), does not have this problem with checkpoints (it however has its own background activity, like level compactions and tombstone maintenance, but it is a different topic).

I do not know how WiredTiger is going to approach this issue with checkpoint, but we are looking to improve TokuDB to make this less painful for user queries – and eventually to move to “fuzzy checkpointing” model.

The post Checkpoint strikes back appeared first on Percona Data Performance Blog.


InnoDB vs TokuDB in LinkBench benchmark

Previously I tested Tokutek’s Fractal Trees (TokuMX & TokuMXse) as MongoDB storage engines – today let’s look into the MySQL area.

I am going to use modified LinkBench in a heavy IO-load.

I compared InnoDB without compression, InnoDB with 8k compression, TokuDB with quicklz compression.
Uncompressed datasize is 115GiB, and cachesize is 12GiB for InnoDB and 8GiB + 4GiB OS cache for TokuDB.

Important to note is that I used tokudb_fanout=128, which is only available in our latest Percona Server release.
I will write more on Fractal Tree internals and what does tokudb_fanout mean later. For now let’s just say it changes the shape of the fractal tree (comparing to default tokudb_fanout=16).

I am using two storage options:

  • Intel P3600 PCIe SSD 1.6TB (marked as “i3600” on charts) – as a high end performance option
  • Crucial M500 SATA SSD 900GB (marked as “M500” on charts) – as a low end SATA SSD

The full results and engine options are available here

Results on Crucial M500 (throughput, more is better)

Crucial M500

    Engine Throughput [ADD_LINK/10sec]

  • InnoDB: 6029
  • InnoDB 8K: 6911
  • TokuDB: 14633

There TokuDB outperforms InnoDB almost two times, but also shows a great variance in results, which I correspond to a checkpoint activity.

Results on Intel P3600 (throughput, more is better)

Intel P3600

  • Engine Throughput [ADD_LINK/10sec]
  • InnoDB: 27739
  • InnoDB 8K: 9853
  • TokuDB: 20594

To understand the reasoning why InnoDB shines on a fast storage let’s review IO usage by all engines.
Following chart shows Reads in KiB, that engines, in average, performs for a request from client.

IO Reads

Following chart shows Writes in KiB, that engines, in average, performs for a request from client.

IO Writes

There we can make interesting observations that TokuDB on average performs two times less writes than InnoDB, and this is what allows TokuDB to be better on slow storages. On a fast storage, where there is no performance penalty on many writes, InnoDB is able to get ahead, as InnoDB is still better in using CPUs.

Though, it worth remembering, that:

  • On a fast expensive storage, TokuDB provides a better compression, which allows to store more data in limited capacity
  • TokuDB still writes two time less than InnoDB, that mean twice longer lifetime for SSD (still expensive).

Also looking at the results, I can make the conclusion that InnoDB compression is inefficient in its implementation, as it is not able to get benefits: first, from doing less reads (well, it helps to get better than uncompressed InnoDB, but not much); and, second, from a fast storage.

The post InnoDB vs TokuDB in LinkBench benchmark appeared first on Percona Data Performance Blog.


Fractal Tree library as a Key-Value store

As you may know, Tokutek is now part of Percona and I would like to explain some internals of TokuDB and TokuMX – what performance benefits they bring, along with further optimizations we are working on.

However, before going into deep details, I feel it is needed to explain the fundamentals of Key-Value store, and how Fractal Tree handles it.

Before that, allow me to say that I hear opinions that the “Fractal Tree” name does not reflect an internal structure and looks more like a marketing term than a technical one. I will not go into this discussion and will keep using name “Fractal Tree” just out of the respect to inventors. I think they are in a position to name their invention with any name they want.

So with that said, the Fractal Tree library implements a new data structure for a more efficient handling (with main focus on insertion, but more on this later) of Key-Value store.

You may question how Key-Value is related to SQL Transactional databases – this is more from the NOSQL world. Partially this is true, and Fractal Tree Key-Value library is successfully used in Percona TokuMX (based on MongoDB 2.4) and Percona TokuMXse (storage engine for MongoDB 3.0) products.

But if we look on a Key-Value store in general, actually it maybe a good fit to use in structural databases. To explain this, let’s take a look in Key-Value details.

So what is Key-Value data structure?

We will use a notation (k,v), or key=>val, which basically mean we associate some value “v” with a key “k”. For software developers following analogies may be close:
key-value access is implemented as dictionary in Python, associative array in PHP or map in C++.
(More details in Wikipedia)

I will define key-value structure as a list of pairs (k,v).

It is important to note that both key and value cannot be just scalars (single value), but to be compound.
That is "k1, k2, k3 => v1, v2", which we can read as (give me two values by a 3-part key).

This brings us closer to a database table structure.
If we apply additional requirement that all (k) in list (k,v) must be unique, this will represent
a PRIMARY KEY for a traditional database table.
To understand this better, let’s take a look on following table:
CREATE TABLE metrics (
ts timestamp,
device_id int,
metric_id int,
cnt int,
val double,
PRIMARY KEY (ts, device_id, metric_id),
KEY metric_id (metric_id, ts),
KEY device_id (device_id, ts)

We can state that Key-Value structure (ts, device_id, metric_id => cnt, val), with a requirement
"ts, device_id, metric_id" to be unique, represents PRIMARY KEY for this table, actually this is how InnoDB (and TokuDB for this matter) stores data internally.

Secondary indexes also can be represented in Key=>Value notion, for example, again, how it is used in TokuDB and InnoDB:
(seconday_index_key=>primary_key), where a key for a secondary index points to a primary key (so later we can get values by looking up primary key). Please note that that seconday_index_key may not be unique (unless we add an UNIQUE constraint to a secondary index).

Or if we take again our table, the secondary keys are defined as
(metric_id, ts => ts, device_id, metric_id)
(device_id, ts => ts, device_id, metric_id)

It is expected from a Key-Value storage to support basic data manipulation and extraction operations, such as:

        – Add or Insert: add

(key => value)

        pair to a collection
        – Update: from

(key => value2)


(key => value2)

        , that is update


        assigned to


        – Delete: remove(key): delete a pair

(key => value)

        from a collection
        – Lookup (select): give a


        assigned to


and I want to add fifth operation:

        – Range lookup: give all values for keys defined by a range, such as

"key > 5"


"key >= 10 and key < 15"

They way software implements an internal structure of Key-Value store defines the performance of mentioned operations, and especially if datasize of a store grows over a memory capacity.

For the decades, the most popular data structure to represent Key-Value store on disk is B-Tree, and within the reason. I won’t go into B-Tree details (see for example, but it provides probably the best possible time for Lookup operations. However it has challenges when it comes to Insert operations.

And this is an area where newcomers to Fractal Tree and LSM-tree ( propose structures which provide a better performance for Insert operations (often at the expense of Lookup/Select operation, which may become slower).

To get familiar with LSM-tree (this is a structure used by RocksDB) I recommend And as for Fractal Tree I am going to cover details in following posts.

The post Fractal Tree library as a Key-Value store appeared first on MySQL Performance Blog.


MongoDB benchmark: sysbench-mongodb IO-bound workload comparison

In this post I’ll share the results of a sysbench-mongodb benchmark I performed on my server. I compared MMAP, WiredTiger, RocksDB and TokuMXse (based on MongoDB 3.0) and TokuMX (based on MongoDB 2.4) in an IO-intensive workload.

The full results are available here, and below I’ll just share the summary chart:

MongoDB benchmarks: sysbench-mongodb IO-bound workload

I would like to highlight that this benchmark was designed to emulate a heavy IO load on a (relatively) slow IO subsystem. This use case, I believe, is totally valid and represents frequently used “cloud” setups with limited memory and slow IO.

The WiredTiger engine, as B-Tree based, is expected to perform worse comparing to RocksDB and Toku Fractal Trees, which, are designed to handle IO-intensive workloads. My assumption is that WiredTiger will perform better (or even outperform others) for CPU intensive in-memory workloads (see for example Mark Callaghan’s results). Also WiredTiger is expected to perform better with faster storage.

The post MongoDB benchmark: sysbench-mongodb IO-bound workload comparison appeared first on MySQL Performance Blog.


Using Cgroups to Limit MySQL and MongoDB memory usage

Quite often, especially for benchmarks, I am trying to limit available memory for a database server (usually for MySQL, but recently for MongoDB also). This is usually needed to test database performance in scenarios with different memory limits. I have physical servers with the usually high amount of memory (128GB or more), but I am interested to see how a database server will perform, say if only 16GB of memory is available.

And while InnoDB usually respects the setting of innodb_buffer_pool_size in O_DIRECT mode (OS cache is not being used in this case), more engines (TokuDB for MySQL, MMAP, WiredTiger, RocksDB for MongoDB) usually get benefits from OS cache, and Linux kernel by default is generous enough to allocate as much memory as available. There I should note that while TokuDB (and TokuMX for MongoDB) supports DIRECT mode (that is bypass OS cache), we found there is a performance gain if OS cache is used for compressed pages.

Well, an obvious recommendation on how to restrict available memory would be to use a virtual machine, but I do not like this because virtualization does come cheap and usually there are both CPU and IO penalties.

Other popular options I hear are:

  • to use "mem=" option in a kernel boot line. Despite the fact that it requires a server reboot by itself (so you can’t really script this and leave for automatic iterations through different memory options), I also suspect it does not work well in a multi-node NUMA environment – it seems that a kernel limits memory only from some nodes and not from all proportionally
  • use an auxiliary program that allocates as much memory as you want to make unavailable and execute mlock call. This option may work, but I again have an impression that the Linux kernel does not always make good choices when there is a huge amount of locked memory that it can’t move around. For example, I saw that in this case Linux starts swapping (instead of decreasing cached pages) even if vm.swappiness is set to 0.

Another option, on a raising wave of Docker and containers (like LXC), is, well, to use docker or another container… put a database server inside a container and limit resources this way. This, in fact, should work, but if you are lazy as I am, and do not want to deal with containers, we can just use Cgroups (, which in fact are extensively used by mentioned Docker and LXC.

Using cgroups, our task can be accomplished in a few easy steps.

1. Create control group: cgcreate -g memory:DBLimitedGroup (make sure that cgroups binaries installed on your system, consult your favorite Linux distribution manual for how to do that)
2. Specify how much memory will be available for this group:
echo 16G > /sys/fs/cgroup/memory/DBLimitedGroup/memory.limit_in_bytesThis command limits memory to 16G (good thing this limits the memory for both malloc allocations and OS cache)
3. Now, it will be a good idea to drop pages already stayed in cache:
sync; echo 3 > /proc/sys/vm/drop_caches
4. And finally assign a server to created control group:

cgclassify -g memory:DBLimitedGroup `pidof mongod`

This will assign a running mongod process to a group limited by only 16GB memory.

On this, our task is accomplished… but there is one more thing to keep in mind.

This are dirty pages in the OS cache. As long as we rely on OS cache, Linux will control writing from OS cache to disk by two variables:
/proc/sys/vm/dirty_background_ratio and /proc/sys/vm/dirty_ratio.

These variables are percentage of memory that Linux kernel takes as input for flushing of dirty pages.

Let’s talk about them a little more. In simple terms:
/proc/sys/vm/dirty_background_ratio which by default is 10 on my Ubuntu, meaning that Linux kernel will start background flushing of dirty pages from OS cache, when amount of dirty pages reaches 10% of available memory.

/proc/sys/vm/dirty_ratio which by default is 20 on my Ubuntu, meaning that Linux kernel will start foreground flushing of dirty pages from OS cache, when amount of dirty pages reaches 20% of available memory. Foreground means that user threads executing IO might be blocked… and this is what will cause IO stalls for a user (and we want to avoid at all cost).

Why this is important to keep in mind? Let’s consider 20% from 256GB (this is what I have on my servers), this is 51.2GB, which database can make dirty VERY fast in write intensive workload, and if it happens that server has a slow storage (HDD RAID or slow SATA SSD), it may take long time for Linux kernel to flush all these pages, while stalling user’s IO activity meantime.

So it is worth to consider changing these values (or corresponding /proc/sys/vm/dirty_background_bytes and /proc/sys/vm/dirty_bytes if you like to operate in bytes and not in percentages).

Again, it was not important for our traditional usage of InnoDB in O_DIRECT mode, that’s why we did not pay much attention before to Linux OS cache tuning, but as soon as we start to rely on OS cache, this is something to keep in mind.

Finally, it’s worth remembering that dirty_bytes and dirty_background_bytes are related to ALL memory, not controlled by cgroups. It applies also to containers, if you are running several Docker or LXC containers on the same box, dirty pages among ALL of them are controlled globally by a single pair of dirty_bytes and dirty_background_bytes.

It may change it future Linux kernels, as I saw patches to apply dirty_bytes and dirty_background_bytes to cgroups, but it is not available in current kernels.

The post Using Cgroups to Limit MySQL and MongoDB memory usage appeared first on MySQL Performance Blog.


LinkBenchX: benchmark based on arrival request rate

An idea for a benchmark based on the “arrival request” rate that I wrote about in a post headlined “Introducing new type of benchmark” back in 2012 was implemented in Sysbench. However, Sysbench provides only a simple workload, so to be able to compare InnoDB with TokuDB, and later MongoDB with Percona TokuMX, I wanted to use more complicated scenarios. (Both TokuDB and TokuMX are part of Percona’s product line, in the case you missed Tokutek now part of the Percona family.)

Thanks to Facebook – they provide LinkBench, a benchmark that emulates the social graph database workload. I made modifications to LinkBench, which are available here: The summary of modifications is

  • Instead of generating events in a loop, we generate events with requestrate and send the event for execution to one of available Requester thread.
  • At the start, we establish N (requesters) connections to database, which are idle by default, and just waiting for an incoming event to execute.
  • The main output of the benchmark is 99% response time for ADD_LINK (INSERT + UPDATE request) and GET_LINKS_LIST (range SELECT request) operations.
  • The related output is Concurrency, that is how many Requester threads are active during the time period.
  • Ability to report stats frequently (5-10 sec interval); so we can see a trend and a stability of the result.

Also, I provide a Java package, ready to execute, so you do not need to compile from source code. It is available on the release page at

So the main focus of the benchmark is the response time and its stability over time.

For an example, let’s see how TokuDB performs under different request rates (this was a quick run to demonstrate the benchmark abilities, not to provide numbers for TokuDB).

First graph is the 99% response time (in milliseconds), measured each 10 sec, for arrival rate 5000, 10000 and 15000 operations/sec:


or, to smooth spikes, the same graph, but with Log 10 scale for axe Y:

So there are two observations: the response time increases with an increase in the arrival rate (as it supposed to be), and there are periodical spikes in the response time.

And now we can graph Concurrency (how many Threads are busy working on requests)…

…with an explainable observation that more threads are needed to handle bigger arrival rates, and also during spikes all available 200 threads (it is configurable) become busy.

I am looking to adopt LinkBenchX to run an identical workload against MongoDB.
The current schema is simple

CREATE TABLE `linktable` (
  `id1` bigint(20) unsigned NOT NULL DEFAULT '0',
  `id2` bigint(20) unsigned NOT NULL DEFAULT '0',
  `link_type` bigint(20) unsigned NOT NULL DEFAULT '0',
  `visibility` tinyint(3) NOT NULL DEFAULT '0',
  `data` varchar(255) NOT NULL DEFAULT '',
  `time` bigint(20) unsigned NOT NULL DEFAULT '0',
  `version` int(11) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (link_type, `id1`,`id2`),
  KEY `id1_type` (`id1`,`link_type`,`visibility`,`time`,`id2`,`version`,`data`)
CREATE TABLE `counttable` (
  `id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `link_type` bigint(20) unsigned NOT NULL DEFAULT '0',
  `count` int(10) unsigned NOT NULL DEFAULT '0',
  `time` bigint(20) unsigned NOT NULL DEFAULT '0',
  `version` bigint(20) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`,`link_type`)
CREATE TABLE `nodetable` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `type` int(10) unsigned NOT NULL,
  `version` bigint(20) unsigned NOT NULL,
  `time` int(10) unsigned NOT NULL,
  `data` mediumtext NOT NULL,

I am open for suggestions as to what is the proper design of documents for MongoDB – please leave your recommendations in the comments.

The post LinkBenchX: benchmark based on arrival request rate appeared first on MySQL Performance Blog.


MySQL 5.6 Transportable Tablespaces best practices

In MySQL 5.6 Oracle introduced a Transportable Tablespace feature (copying tablespaces to another server) and Percona Server adopted it for partial backups which means you can now take individual database or table backups and your destination server can be a vanilla MySQL server. Moreover, since Percona Server 5.6, innodb_import_table_from_xtrabackup is obsolete as Percona Server also implemented Oracle MySQL’s transportable tablespaces feature which as I mentioned gives you the ability to copy tablespace (table.ibd) between servers. Let me demonstrate this through one example where I am going to take partial backup of selective tables instead of an entire MySQL server and restore it on a running MySQL server on destination without taking it offline.

MySQL 5.6 Transportable Tablespaces with Percona XtraBackup
MySQL 5.6 Transportable Tablespaces with Percona XtraBackup
Percona XtraBackup is open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases. You can backup up your databases without sacrificing read/write ability and, on top of that, Percona XtraBackup supports partial backup schemas that correspond to backup-only specific databases or tables instead of taking backups of the entire database server.

For partial backups, your source server from where you taking the backup must have the innodb_file_per_table option enabled and the importing server should have innodb_file_per_table and innodb_expand_import enabled  – or innodb_import_table_from_xtrabackup (only supported for Percona Server) depends on Percona Server version for the the last option for restoring the database tables. This is all valid till Percona Server version 5.5 and you can find further details about partial backups here. Percona CTO Vadim Tkachenko wrote nice post on it about how to copy InnoDB tables between servers on Percona Server prior to version 5.6.

I am going to use Percona Server 5.6 as it uses the feature of Transportable Tablespace. There are two tables under database irfan named as “test” and “dummy”. I am going to take backup of only test table as partial backup instead taking backup of entire database server.

mysql> show tables;
| Tables_in_irfan|
| dummy          |
| test           |
mysql> show create table test;
| Table | Create Table                                                                                                             |
| test  | CREATE TABLE `test` (
  `id` int(11) DEFAULT NULL,
  `name` char(15) DEFAULT NULL
1 row in set (0.00 sec)
mysql> SELECT COUNT(*) FROM test;
| COUNT(*) |
|     1000 |
1 row in set (0.00 sec)

I am going to use latest version of Percona XtraBackup which supports multiple ways to take partial backups that are –include option, –tables-file option and –databases option. I am going to use –tables-file option to take backup of specific database table.

irfan@source$ xtrabackup --version
xtrabackup version 2.2.5 based on MySQL server 5.6.21 Linux (x86_64) (revision id: )
irfan@source$ mysql --skip-column-names -e "SELECT CONCAT(table_schema,'.',table_name) FROM information_schema.tables WHERE table_schema IN ('irfan') AND TABLE_NAME = 'test';" > /root/tables_to_backup.txt
irfan@source$ cat tables_to_backup.txt

Now as below you need to take backup of irfan.test database table. Note, how –tables-file option passed to backup only irfan.test database table which takes backup of only specified database table in tables_to_backup.txt file.

irfan@source$ innobackupex --no-timestamp --tables-file=/root/tables_to_backup.txt /root/partial_backup/ > /root/xtrabackup.log 2>&1
irfan@source$ cat /root/xtrabackup.log
>> log scanned up to (2453801809)
>> log scanned up to (2453801809)
[01]        ...done
[01] Copying ./irfan/test.ibd to /root/partial_backup/irfan/test.ibd
[01]        ...done
xtrabackup: Creating suspend file '/root/partial_backup/xtrabackup_suspended_2' with pid '14442'
141101 12:37:27  innobackupex: Starting to backup non-InnoDB tables and files
innobackupex: in subdirectories of '/var/lib/mysql'
innobackupex: Backing up file '/var/lib/mysql/irfan/test.frm'
141101 12:37:27  innobackupex: Finished backing up non-InnoDB tables and files
141101 12:37:27  innobackupex: Executing LOCK BINLOG FOR BACKUP...
141101 12:37:27  innobackupex: Executing FLUSH ENGINE LOGS...
141101 12:37:27  innobackupex: Waiting for log copying to finish
xtrabackup: Creating suspend file '/root/partial_backup/xtrabackup_log_copied' with pid '14442'
xtrabackup: Transaction log of lsn (2453801809) to (2453801809) was copied.
141101 12:37:28  innobackupex: Executing UNLOCK BINLOG
141101 12:37:28  innobackupex: Executing UNLOCK TABLES
141101 12:37:28  innobackupex: All tables unlocked
141101 12:37:28  innobackupex: completed OK!

Successful backup with show you “completed OK” at the end of the backup. If you scripted the backup for automation you can also check backup status to see whether it succeeded or failed by checking exit status of the backup script.

For the next step, we need to prepare the backup because there might be uncomitted transactions that needs to rollback or transactions in the log to be replayed to backup. You need to mention –export option specifically to prepare backup in order to create table.exp and table.cfg files (prior to MySQL/PS 5.6). You can read more on it in documentation. You can prepare the backup as below.

irfan@source$ innobackupex --apply-log --export /root/partial_backup/ > /root/xtrabackup-prepare.log 2>&1
irfan@source$ cat /root/xtrabackup-prepare.log
xtrabackup version 2.2.5 based on MySQL server 5.6.21 Linux (x86_64) (revision id: )
xtrabackup: auto-enabling --innodb-file-per-table due to the --export option
xtrabackup: cd to /root/partial_backup
xtrabackup: This target seems to be already prepared.
xtrabackup: export option is specified.
xtrabackup: export metadata of table 'irfan/test' to file `./irfan/test.exp` (1 indexes)
xtrabackup:     name=GEN_CLUST_INDEX, id.low=5043, page=3
xtrabackup: starting shutdown with innodb_fast_shutdown = 0
InnoDB: FTS optimize thread exiting.
InnoDB: Starting shutdown...
InnoDB: Shutdown completed; log sequence number 2453817454
141102 11:25:13  innobackupex: completed OK!

Again a successfully prepared backup should show you “completed OK” at end. Once the backup is prepared it means it’s usable and ready to restore. For that first create the same table structure on destination as source.

mysql [localhost] {msandbox} (irfan) > CREATE TABLE `test` (
    ->   `id` int(11) DEFAULT NULL,
    ->   `name` char(15) DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.12 sec)
mysql [localhost] {msandbox} (irfan) > show tables;
| Tables_in_irfan |
| test           |
1 row in set (0.00 sec)
irfan@destination$ ls -la data/irfan/
total 116
drwx------ 2 root root  4096 Nov  2 16:29 .
drwx------ 6 root root  4096 Nov  2 16:23 ..
-rw-rw---- 1 root root  8586 Nov  2 16:29 test.frm
-rw-rw---- 1 root root 98304 Nov  2 16:29 test.ibd

Now discard the existing tablespace and copy the test.ibd and test.cfg which Percona XtraBackup produced after preparing the backup and import tablespace back from source to destination as illustrated below. As a side note, you may need to change the ownership of test.cfg file and test.ibd file to mysql in order to make it accessible from MySQL server.

mysql [localhost] {msandbox} (irfan) > ALTER TABLE test DISCARD TABLESPACE;
Query OK, 0 rows affected (0.10 sec)
irfan@source$ cd /root/partial_backup/irfan/
irfan@source$ scp test.cfg test.ibd root@destination:/root/sandboxes/msb_PS-5_6_21/data/irfan/
mysql [localhost] {msandbox} (irfan) > ALTER TABLE test IMPORT TABLESPACE;
Query OK, 0 rows affected (0.10 sec)
irfan@destination$ tail -30 data/msandbox.err
2014-11-02 16:32:53 2037 [Note] InnoDB: Importing tablespace for table 'irfan/test' that was exported from host 'Hostname unknown'
2014-11-02 16:32:53 2037 [Note] InnoDB: Phase I - Update all pages
2014-11-02 16:32:53 2037 [Note] InnoDB: Sync to disk
2014-11-02 16:32:53 2037 [Note] InnoDB: Sync to disk - done!
2014-11-02 16:32:53 2037 [Note] InnoDB: Phase III - Flush changes to disk
2014-11-02 16:32:53 2037 [Note] InnoDB: Phase IV - Flush complete
mysql [localhost] {msandbox} (irfan) > SELECt COUNT(*) FROM test;
| COUNT(*) |
|     1000 |

NOTE:  The .cfg file contains the InnoDB dictionary dump in special(binary) format for corresponding table.The .cfg file is not required to import a tablespace to MySQL 5.6 or Percona Server 5.6. A tablespace can be imported successfully even if it is from another server, but innodb will do schema validation if the corresponding .cfg file is present in the same directory.

Now, as you can see from the error log, that table is imported successfully on test database and changes to innodb tablespace completed correctly. My colleague Miguel Angel Nieto wrote a related post on this titled, “How to recover a single InnoDB table from a Full Backup.”

There is another method to copy a table from a running MySQL instance to another running MySQL server which is described in the MySQL manual. For completeness let me describe to you the procedure quickly.

MySQL 5.6 Transportable Tablespaces with FLUSH TABLES FOR EXPORT
On source server, I have table named “dummy” which i will copy to destination server.

mysql [localhost] {msandbox} (irfan) > show tables;
| Tables_in_test |
| dummy          |
| test           |
mysql [localhost] {msandbox} (irfan) > SELECT COUNT(*) FROM dummy;
| COUNT(*) |
|      100 |

First, create the same table structure on destination server.

mysql [localhost] {msandbox} (irfan) > CREATE TABLE `dummy` (
`id` int(11) DEFAULT NULL,
`dummy` varchar(10) DEFAULT NULL
Query OK, 0 rows affected (0.07 sec)

On the destination server, discard the existing tablespace before importing tablespace from source server.

mysql [localhost] {msandbox} (irfan) > ALTER TABLE dummy DISCARD TABLESPACE;
Query OK, 0 rows affected (0.00 sec)

Run FLUSH TABLES FOR EXPORT on the source server to ensure all table changes flushed to the disk. It will block write transactions to the named table while only allowing read-only operations. This can be a problem on high-write workload systems as a table can be blocked for a longer period of time if your table is huge in size. While making backups with Percona XtraBackup you can manage this in a non-blocking way and it will also save binary log coordinates that can be useful in replication and disaster recovery scenario.

FLUSH TABLES FOR EXPORT is only applicable to Oracle MySQL 5.6/Percona Server 5.6 FLUSH TABLES FOR EXPORT and will produce table metadata file .cfg and tablespace file .ibd. Make sure you copy both table.cfg and table.ibd files before releasing lock. It’s worth mentioning that you shouldn’t logout from the mysql server before copying table cfg and table.ibd file otherwise it will release the lock. After copying table metadata file (.cfg) and tablespace (.ibd) file release the lock on source server.

# Don't terminate session after acquiring lock otherwise lock will be released.
mysql [localhost] {msandbox} (irfan) > FLUSH TABLES dummy FOR EXPORT;
Query OK, 0 rows affected (0.00 sec)
# open another terminal Need another session to copy table files.
irfan@source$ scp dummy.cfg dummy.ibd root@destination-server:/var/lib/mysql/irfan/
# Go back to first session.
mysql [localhost] {msandbox} (irfan) > UNLOCK TABLES;

On destination server import the tablespace and verify from mysql error log as below.

mysql [localhost] {msandbox} (test) > ALTER TABLE dummy IMPORT TABLESPACE;
Query OK, 0 rows affected (0.04 sec)
$ tail -f data/msandbox.err
2014-11-04 13:12:13 2061 [Note] InnoDB: Phase I - Update all pages
2014-11-04 13:12:13 2061 [Note] InnoDB: Sync to disk
2014-11-04 13:12:13 2061 [Note] InnoDB: Sync to disk - done!
2014-11-04 13:12:13 2061 [Note] InnoDB: Phase III - Flush changes to disk
2014-11-04 13:12:13 2061 [Note] InnoDB: Phase IV - Flush complete
mysql [localhost] {msandbox} (irfan) > SELECT COUNT(*) FROM dummy;
| COUNT(*) |
|      100 |

Take into account that there are some LIMITATIONS when copying tablespace between servers which are briefly outlined in the MySQL manual

Transportable Tablespace is nice feature introduced in MySQL 5.6 and I described the use cases for that in this post. Prior to MySQL 5.6, you could backup/restore specific database tables via Percona XtraBackup’s partial backup feature (destination server should be Percona Server for this case).  Comments are welcome :)

The post MySQL 5.6 Transportable Tablespaces best practices appeared first on MySQL Performance Blog.


“How to monitor MySQL performance” with Percona Cloud Tools: June 25 webinar

We recently released a new version of Percona Cloud Tools with MySQL monitoring capabilities. Join me June 25 and learn the details about all of the great new features inside Percona Cloud Tools – which is now free in beta. The webinar is titled “Monitoring All (Yes, All!) MySQL Metrics with Percona Cloud Tools” and begins at 10 a.m. Pacific time.

In addition to MySQL metrics, Percona Cloud Tools also monitors OS performance-related stats. The new Percona-agent gathers metrics with fine granularity (up to once per second), so you are able to see any of these metrics updated real-time.

During the webinar I’ll explain how the new Percona-agent works and how to configure it. And I’ll demonstrate the standard dashboard with the most important MySQL metrics and how to read them to understand your MySQL performance.

Our goal with the new implementation was to make installation as easy as possible. Seriously it should not take so much effort as it has in the past to get visibility into your MySQL performance. We also wanted to provide as much visibility as possible.

Please take a moment and register now for the webinar. I also encourage you, if you haven’t already, to sign up for access to the free Percona Cloud Tools beta ahead of time. At the end of the next week’s webinar you’ll know how to use the Percona-agent and will be able to start monitoring MySQL in less than 15 minutes!

See you June 25 and in the meantime you can check out our previous related posts: “From zero to full visibility of MySQL in 3 minutes with Percona Cloud Tools” and “Introducing the 3-Minute MySQL Monitor.”

The post “How to monitor MySQL performance” with Percona Cloud Tools: June 25 webinar appeared first on MySQL Performance Blog.


Why did we develop percona-agent in Go?

We recently open-sourced our percona-agent and if you check out the source code, you’ll find that it is written in the Go programming language (aka Golang). For those not up to speed, the percona-agent is a real-time client-side agent for Percona Cloud Tools.

Our requirements are quite demanding for our agents. This one is software that works on a real production server, so it must be fast, reliable, lightweight and easy to distribute. Surprisingly enough, binaries compiled by Go fit these characteristics.

There are of course alternatives that we considered. On the scripting side: Perl, Python, PHP, Ruby et al. These are not necessarily fast, and the distribution is also interesting. We have enough experience with Percona Toolkit and Perl’s “modules dependencies hell.”

On a high-end level side, there is C / C++ and I am sure we could produce an efficient agent. However we also have experience in the distribution of Percona Server / Percona XtraDB Cluster / Percona XtraBackup. Well, I have had enough with different versions of Ubuntus, Red Hats, CentOSes and the rest of the flavors of Linux.

And, yeah, there is Java, but let me just say that we are not the Java sort of developers.

So what is so charming about Go? Go feels like a scripting language, but produces executable binaries. I see it as having the ability to attack performance issues on two sides. First is the performance of software developers: They are more productive working with scripting-like languages. Second is the performance of a final product: Native self-executable binaries are more efficient than a script running through a interpreter.

It is worth noting that included batteries (a set of packages that are coming with Go) are excellent, and in many cases that will be just enough to get you going and produce software that is quite complex. And if that is not enough, there is also a list of packages and projects for Go.

Of course, there are some limitations you will find in Go (some of them are temporary I hope). These are:

1. The list of supported platforms is limited… FreeBSD (release 8 and above), Linux (ARM not supported), Mac OS X and Windows. There are no packages for Solaris yet.
2. A good MySQL driver is still a work in progress. the most advanced is Go-MySQL-Driver
3. Go comes with built-in testing capabilities, but our testing enthusiast, Daniel, says it is not enough to build a complicated testing suite.
4. There is no support of “generics” (or “templates” if you are in C++ world). Basically it means that if you developed a data structure that works with integers, you will need to copy-paste-replace to make it working with floats. Yes, there are workarounds like using a “parent-to-everything” type “interface{}”, but often it is not efficient and just looks ugly.

There is also no automatic type-conversion between int and floats, so if you need to do complex math which involves ints and floats, you may end up with a lot back-and-forth conversions, i.e. int(math.Floor(t.epsilon*float64(t.count*2)))

To finish, I would like to invite you to my webinar, “Monitoring All (Yes, All!) MySQL Metrics with Percona Cloud Tools” on Wednesday, June 25 at 10 a.m. Pacific Daylight Time, where I will talk on the new features in Percona Cloud Tools, including our new percona-agent.

The post Why did we develop percona-agent in Go? appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by