Feb
21
2012
--

DBD::mysql 4.014 breaks pt-table-checksum 2.0

DBD::mysql 4.014 breaks pt-table-checksum 2.0.  The cause is unknown, but the effect is a lot of errors like:

DBD::mysql::st execute failed: called with 2 bind variables when 6 are needed [for Statement "..." with ParamValues: ...] at ./pt-table-checksum line 7216.

The fix is simple: upgrade (or even downgrade) DBD::mysql to any version except 4.014. To see which version of DBD::mysql a system has, execute:

perl -MDBD::mysql -e 'print $DBD::mysql::VERSION, "\n";'

This bug may affect other Percona Toolkit tools, but currently pt-table-checksum 2.0 is the only victim. This bug does not affect pt-table-checksum 1.0, and it cannot be worked around in pt-table-checksum 2.0 because the bug in in DBD::mysql.

This bug affects pt-table-checksum 2.0 but not 1.0 because the newer version uses prepared statements with parameter values, whereas the older version does not. It seems, although I have not verified this, that DBD::mysql 4.014 has some sort of caching mechanism which causes it to use the wrong prepared statement.

Feb
20
2012
--

How to recover deleted rows from an InnoDB Tablespace

In my previous post I explained how it could be possible to recover, on some specific cases, a single table from a full backup in order to save time and make the recovery process more straightforward. Now the scenario is worse because we don’t have a backup or the backup restore process doesn’t work. How can I recover deleted rows?

We’re going to follow the same example as in my previous post so we need to delete the records of the employee 10008 from the table “salaries”. After the “accidental” deletion of rows you should stop MySQL, take a copy of the salaries.ibd and start it again. Later, we’ll extract those deleted rows from the ibd file and import them into the database. The time between the deletion of rows and the database stop is crucial. If pages are reused you can’t recover the data.

The tool that we’re going to use to achieve this objetive is Percona Data Recovery Tool for InnoDB. This is the tool that we use in our data recovery service and it’s open source.

I’m going to explain the full process in four different steps to make it clearer:

1- Extract all the InnoDB pages from the tablespace:

First we need to download Percona Data Recovery Tool and compile all the tools using the “make” command. In this example I’m going to install the tools in /root/recovery-tool folder and the data like tablespaces and recovered rows in /root/recovery-tool/data.

After the compile process we need to copy the salaries.ibd tablespace to the recovery-tool’s data directory. In order to extract all the pages we’ll use the page_parser tool. This tool will find and extract all the pages of the tablespace to an output directory. We only need to specify the row format (-5) and where is our tablespace located (-f)

The row format can be -4 (REDUNDANT) or -5 (COMPACT). From 5.0.3 the default format is COMPACT. More information about row format on the following link:

http://dev.mysql.com/doc/refman/5.5/en/innodb-physical-record.html

You can also get the table row format from the Information Schema:

mysql (information_schema) > SELECT ROW_FORMAT from TABLES WHERE TABLE_SCHEMA='employees' AND TABLE_NAME='salaries';
+------------+
| ROW_FORMAT |
+------------+
| Compact |
+------------+

~/recovery-tool# ./page_parser -5 -f data/salaries.ibd
Opening file: data/salaries.ibd:
[...]
71.43% done. 2012-02-14 13:10:08 ETA(in 00:00 hours). Processing speed: 104857600 B/sec

All pages are stored in a single directory with some subdirectories inside, one for every single index in the table:

~/recovery-tool# ls pages-1329221407/FIL_PAGE_INDEX/
0-26 0-27

In this case, there are two indexes with the IDs 0-26 and 0-27. InnoDB has Clustered Primary Key, that is, the data is organized along with the primary key. Therefore if we want to extract the rows data we need to identify which of those two indexes is the Primary Key. This is our next step.

2- Identifying the Primary Key

There are different methods to find the correct index and here I’m going to explain three of them:

A) INNODB_SYS_INDEXES

Percona Server has some extra tables in INFORMATION_SCHEMA that can help us to find the different indexes and types.

mysql (information_schema) > select i.INDEX_ID, i.NAME FROM INNODB_SYS_INDEXES as i INNER JOIN INNODB_SYS_TABLES as t USING(TABLE_ID) WHERE t.NAME='salaries';
+----------+---------+
| INDEX_ID | NAME |
+----------+---------+
| 26 | PRIMARY |
| 27 | emp_no |
+----------+---------+

B) InnoDB Table Monitor

The index information can also be taken directly from MySQL using the InnoDB Tablespace Monitor. This monitor will write all the information related with tables and indexes (with their IDs) to the error log.

mysql (employees) > CREATE TABLE innodb_table_monitor (id int) ENGINE=InnoDB;

TABLE: name employees/salaries, id 18, flags 1, columns 7, indexes 2, appr.rows 2844513
[...]
INDEX: name PRIMARY, id 26, fields 2/6, uniq 2, type 3
root page 3, appr.key vals 2844513, leaf pages 6078, size pages 6120
FIELDS: emp_no from_date DB_TRX_ID DB_ROLL_PTR salary to_date
INDEX: name emp_no, id 27, fields 1/2, uniq 2, type 0
root page 4, appr.key vals 306195, leaf pages 2189, size pages 2212
FIELDS: emp_no from_date
[...]

The second method has the same result, 0-26 is our primary key. After identifying the Primary Key don’t forget to remove the innodb_table_monitor.

C) Check the size on disk of every indes

This is very dependent of the table schema, but normally the primary key will be larger on disk because it stores also the row itself.

~/recovery-tool/pages-1329221407/FIL_PAGE_INDEX# du -hs 0-26/
96M 0-26/
~/recovery-tool/pages-1329221407/FIL_PAGE_INDEX# du -hs 0-27/
35M 0-27/

In our examples 0-26 seems to be the Primary Key.

3- Extract the rows

We know in which index is the data so the next step is clear, extract the rows from it. To accomplish this task we’ll use constraint_parser command. In order to use it the tool needs to know the table schema structure, that is, column types, names and attributes. This information needs to be available on the header file recovery-tools/include/table_defs.h so it will be necessary to recompile the tool.

To help us in the task of converting the schema definition to a C header file there is another tool with the name create_defs.pl. This one will connect to the database in order to examine the table and create the table_defs.h content.

~/recovery-tool# ./create_defs.pl --host 127.0.0.1 --port 5520 --user root --password msandbox --db employees --table salaries > include/table_defs.h

More information about the table_defs.h format in the following link:

http://www.percona.com/docs/wiki/innodb-data-recovery-tool:mysql-data-recovery:generating_a_table_definition

With the table definition on table_defs.h it’s the moment to compile again all the tools with “make” command. After the compile process we can use constraints_parser to recover the rows in a human readable format.

~/recovery-tool# ./constraints_parser -5 -D -f pages-1329221407/FIL_PAGE_INDEX/0-26/ > data/salaries.recovery

With -D option we are asking to the constraints_parser to recover only deleted pages. -5 and -f are the same options that we’ve used before with page_parser.

In the salaries.recovery you can find lot of deleted rows, not only rows deleted accidentally. You should manually find the data you need to recover and save it in another file. This is the output of our example:

~/data-recovery# cat data/salaries.recovery
salaries 10008 "1998-03-11" 46671 "1999-03-11"
salaries 10008 "1999-03-11" 48584 "2000-03-10"
salaries 10008 "2000-03-10" 52668 "2000-07-31"

4- Import the rows

With the data in our hands the last step is to import it in our database:

mysql (employees) > LOAD DATA INFILE '/root/recovery-tool/data/salaries.recovery' REPLACE INTO TABLE `salaries` FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES STARTING BY 'salaries\t' (emp_no, from_date, salary, to_date);
Query OK, 3 rows affected (0.01 sec)
Records: 3 Deleted: 0 Skipped: 0 Warnings: 0

mysql (employees) > select * from salaries where emp_no=10008;
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 10008 | 46671 | 1998-03-11 | 1999-03-11 |
| 10008 | 48584 | 1999-03-11 | 2000-03-10 |
| 10008 | 52668 | 2000-03-10 | 2000-07-31 |
+--------+--------+------------+------------+

Data recovered! 10008 has recovered his salary again :)

Conclusion

With InnoDB the deleted rows are not lost. You can recover them from the original tablespace and also from an ibd file if you have a binary backup. Just use constraint_parser without the -D option (deleted) and you will recover all the data that exists inside that tablespace

Feb
20
2012
--

How to convert MySQL’s SHOW PROFILES into a real profile

SHOW PROFILES shows how much time MySQL spends in various phases of query execution, but it isn’t a full-featured profile. By that, I mean that it doesn’t show similar phases aggregated together, doesn’t sort them by worst-first, and doesn’t show the relative amount of time consumed.

I’ll profile the “nicer_but_slower_film_list” included with the Sakila sample database to demonstrate:

mysql> SET profiling=1;
mysql> pager cat > /dev/null
mysql> SELECT * FROM nicer_but_slower_film_list;
997 rows in set (0.18 sec)

The query consumed 0.18 seconds. Where did the time go?

mysql> SHOW PROFILE FOR QUERY 1;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000032 |
| checking permissions | 0.000005 |
... snip ...
| init                 | 0.000021 |
| optimizing           | 0.000003 |
| statistics           | 0.000006 |
| cleaning up          | 0.000003 |
+----------------------+----------+
35 rows in set (0.00 sec)

It’s pretty hard to figure out what consumed the most time, because it’s sorted in execution order, not order of time consumption. Here is one query that can solve this partially:


SET @query_id := 1;
SELECT STATE, SUM(DURATION) AS Total_R,
       ROUND(
          100 * SUM(DURATION) /
             (SELECT SUM(DURATION)
              FROM INFORMATION_SCHEMA.PROFILING
              WHERE QUERY_ID = @query_id
          ), 2) AS Pct_R,
       COUNT(*) AS Calls,
       SUM(DURATION) / COUNT(*) AS "R/Call"
    FROM INFORMATION_SCHEMA.PROFILING
    WHERE QUERY_ID = @query_id
    GROUP BY STATE
    ORDER BY Total_R DESC; 

+----------------------+----------+-------+-------+--------------+
| STATE                | Total_R  | Pct_R | Calls | R/Call       |
+----------------------+----------+-------+-------+--------------+
| removing tmp table   | 0.095135 | 55.10 |     3 | 0.0317116667 |
| Copying to tmp table | 0.046175 | 26.74 |     1 | 0.0461750000 |
| Sending data         | 0.018478 | 10.70 |     3 | 0.0061593333 |
| Sorting result       | 0.011090 |  6.42 |     1 | 0.0110900000 |
| checking permissions | 0.000802 |  0.46 |     6 | 0.0001336667 |
| Creating tmp table   | 0.000548 |  0.32 |     1 | 0.0005480000 |
| Opening tables       | 0.000196 |  0.11 |     1 | 0.0001960000 |
| statistics           | 0.000071 |  0.04 |     2 | 0.0000355000 |
| starting             | 0.000032 |  0.02 |     1 | 0.0000320000 |
| freeing items        | 0.000027 |  0.02 |     2 | 0.0000135000 |
| preparing            | 0.000025 |  0.01 |     2 | 0.0000125000 |
| init                 | 0.000021 |  0.01 |     1 | 0.0000210000 |
| optimizing           | 0.000020 |  0.01 |     2 | 0.0000100000 |
| closing tables       | 0.000014 |  0.01 |     2 | 0.0000070000 |
| System lock          | 0.000010 |  0.01 |     1 | 0.0000100000 |
| query end            | 0.000003 |  0.00 |     1 | 0.0000030000 |
| cleaning up          | 0.000003 |  0.00 |     1 | 0.0000030000 |
| executing            | 0.000002 |  0.00 |     2 | 0.0000010000 |
| end                  | 0.000002 |  0.00 |     1 | 0.0000020000 |
| logging slow query   | 0.000001 |  0.00 |     1 | 0.0000010000 |
+----------------------+----------+-------+-------+--------------+

Much nicer. Now you can see that over 3/4ths of the time was spent working with temporary tables.

But there’s something still missing: it doesn’t show lost time (the amount of time not accounted by the profiling). That is, the SUM(DURATION) isn’t the same as the total query duration. Alas, there is no query-level table that would allow me to subtract the SUM(DURATION) from the query’s real response time. If there were, I could add in a UNION to inject another row for “lost time” and show the portion of response time that wasn’t captured in the profile.

The above query is something I developed for High Performance MySQL Third Edition, by the way. The book should be available in a matter of weeks.

Feb
20
2012
--

MySQL Meetups in North Carolina this week

A quick reminder. I’m speaking on two MySQL meetups in North Carolina this week. Tuesday,Feb 21 on Raleigh MySQL/PHP Meetup and when Wednesday, Feb 22 on Charlotte Queen City PHP Note the last meetup date has changed, it was originally planned for February 23 but had to be rescheduled due to conflicts.

Feb
19
2012
--

Why don’t our new Nagios plugins use caching?

In response to the release of our new MySQL monitoring plugins on Friday, one commenter asked why the new Nagios plugins don’t use caching. It’s worth answering in a post rather than a comment, because there is an important principle that needs to be understood to monitor servers correctly. But first, some history.

When I wrote a set of high-quality Cacti templates for MySQL a few years ago (which are now replaced by the new project), making the Cacti templates use caching was important for two reasons:

  1. Performance. Cacti runs some of its polling processes serially, so if each graph has to reach out to the MySQL server and retrieve a bunch of data, the polling can take too long. I’ve seen cases where a Cacti server that’s graphing too many MySQL servers doesn’t finish the current iteration before the next one starts. Making sure the data isn’t retrieved from MySQL repeatedly cuts a lot of latency out of the process. In fact, I went further than that; I didn’t cache the data retrieved from MySQL, I cached the resulting metrics. This avoids having to do InnoDB parsing repeatedly, too. With this approach, Cacti is capable of monitoring many more servers.
  2. Consistency. Graphs are supposed to be generated at a single instant in time, but as I just said, they actually aren’t. Graphs that show slightly different data, or greatly different data in some cases I observed, are frustrating at best and wrong or misleading at worst. When you’re using these graphs for forensic or diagnostic purposes, this is very bad. Caching the results from the first call and reusing it for all subsequent polls avoids this problem. All of the graphs are generated from one and only one set of data retrieved from MySQL.

It’s worth noting that Cacti doesn’t do this itself, which I consider to be a Cacti design shortcoming. I did not like writing my own cache management code. Handling cached data correctly is one of the two hard things in programming (choosing proper variable names is the other one, of course).

Now, somewhere along the line “you should cache when you are monitoring a system for failures” became a tacit best practice. I don’t know why, but I can speculate that there might have been a misunderstanding — perhaps caching was understood to reduce load on MySQL, rather than reducing load on the Cacti server and speeding up the poller.

This is a very bad idea! I do NOT advocate caching for Nagios checks. Why? Exactly the inverse of the two reasons that graphing should be cached!

  1. Performance doesn’t matter. First, most Nagios checks are run once every five minutes, or once a minute at most. Running SHOW STATUS infrequently doesn’t add load to the server. Most real production installations I see actually have multiple people running innotop or what have you, once a second! Second, if your monitoring is to be at all useful, you should not monitor too many things inside of MySQL. If you take the typical mass-market monitoring program’s silly “alert on 99 different kinds of cache hit ratio thresholds” approach, you’ll soon have sysadmins with /dev/null email filters, which will make your monitoring system utterly useless. However, even if you have multiple checks running once a minute, it’s still a trivial amount of load.
  2. Consistency is just another word for staleness. Great, your monitoring system thinks that the server is fine, based on a three-minute-old file — what good is that? You should be monitoring the server, as of NOW, rather than monitoring the contents of some file as of several minutes ago.

In conclusion, you should cache for historical metrics monitoring, but not for fault detection monitoring. That’s why the new Nagios plugins don’t have any caching functionality.

Feb
18
2012
0

Fake Yzmir Ramirez Facebook Account

http://www.facebook.com/yzmir.ramirez is not me. Looks like its another fake account.

Written by in: facebook | Tags: , ,
Feb
18
2012
--

MariaDB 5.3.4 benchmarks

MariaDB 5.3 has reached the release candidate milestone, and the 5.3 version promises a lot of new features and optimization (i.e in optimizer http://kb.askmonty.org/en/what-is-mariadb-53#query-optimizer). No surprise I wanted to check how all improvements affect general performance.

So I why don’t we run old good sysbench benchmark.

For the benchmark I took:

  • HP ProLiant DL380 G6 box
  • sysbench multitables oltp rw workload, 16 tables, 500mil rows each, total datasize about 30GB
  • working threads from 1 to 256
  • Versions: MariaDB 5.3.4, MySQL 5.5.20
  • Data is stored on RAID10 HDD partition
  • Like in all my recent benchmarks, I make throughput measurements each 10 sec, so we can see the stability of the throughout

The raw results, configuration and scripts are available on our Benchmarks Launchpad

The graphical results:

Throughput (more is better)

Threads MariaDB 5.3.4 MySQL 5.5.20  Ratio
1        252           271          0.9298893
2        412           588          0.7006803
4        801          1097          0.7301732
8       1709          2205          0.7750567
16      3197          4076          0.7843474
32      3303          4166          0.7928469
64      3336          4150          0.8038554
128     3800          4170          0.9112710
256     3710          4131          0.8980876

I was surprised to see that MariaDB shows 20-30% worse throughput.
It seems many changes resulted to performance hit in general. I wonder whether MariaDB team runs performance regression benchmarks, and if they do, why do we see such performance decline.


Feb
18
2012
--

Can we improve MySQL variable handling ?

MySQL Settings (also known as Server Variables) have interesting property. When you set variable in running server this change is not persisted in any way and server will be back to old value upon restart. MySQL also does not have option to re-read config file without restarting as some other software so approach to change config file and when instruct server to re-read it also does not work. This leads to runtime settings being different from settings set in config file, and unexpected change on restart a frequent problem.

pt-config-diff is the tool which can help with this problem a lot, being able to compare settings in my.cnf to those server is currently running with. The problem however this only works well if settings are set in my.cnf as if default option was used and we change it in run time we can’t detect such change easily because MySQL Server does not seems to have an easy way to check what was the default value for given Server Variable.

The only way I’m aware about is running the server from command line with –no-defaults –verbose –help options:

pz@ubuntu:~$ /usr/sbin/mysqld --no-defaults --verbose --help
...
timed-mutexes                                     FALSE
tmp-table-size                                    16777216
tmpdir                                            /tmp
transaction-alloc-block-size                      8192
transaction-isolation                             REPEATABLE-READ
transaction-prealloc-size                         4096
updatable-views-with-limit                        YES
userstat                                          FALSE
verbose                                           TRUE
wait-timeout                                      28800

Which is however rather ugly and only works with shell access to the server which is not always the case.

Interesting enough MySQL Allows you to SET variable to default value (compile time default, not the one server was started with) yet there seems not to be a way to read it:

mysql> set global sort_buffer_size=DEFAULT;
Query OK, 0 rows affected (0.00 sec)

mysql> select @@global.sort_buffer_size;
+---------------------------+
| @@global.sort_buffer_size |
+---------------------------+
|                   2097152 |
+---------------------------+
1 row in set (0.00 sec)

This could be used as technique to detect the value for DEFAULT variables for SESSION variables, yet for some GLOBAL variables setting them back and forth would not be safe.

The simple change which would make dealing with MySQL variables in automated way a lot more convenient would be extending INFORMATION_SCHEMA.GLOBAL_VARIABLES Currently as of MySQL 5.5 it contains only variable name and value. Yet I would suggest adding few more columns such as DEFAULT – to hold compile time default value for variable and STARTUP to hold the value the server was started with.

It also might be good idea to extend SELECT syntax to ease querying of variable global value Right now I can select:

mysql> select @@global.sort_buffer_size;
+---------------------------+
| @@global.sort_buffer_size |
+---------------------------+
|                   2097152 |
+---------------------------+
1 row in set (0.00 sec)

If I could only refer to “default” or “startup” in addition to “global” and “session” prefixes which are available now it would be quite nice.

Feb
17
2012
--

The relationship between Innodb Log checkpointing and dirty Buffer pool pages

This is a time-honored topic, and there’s no shortage of articles on the topic on this blog. I wanted to write a post trying to condense and clarify those posts, as it has taken me a while to really understand this relationship.

Some basic facts

  • Most of us know that writing into Innodb updates buffer pool pages in memory and records page operations in the transaction (redo) log.
  • Behind the scenes those updated (dirty) buffer pool pages are flushed down the to the tablespace.
  • If Innodb stops (read: crashes) with dirty buffer pool pages, Innodb recovery must be done to rebuild the last consistent picture of the database.
  • Recovery uses the transaction log by redoing (hence the name ‘redo log’) the page operations in the log that had not already been flushed to the tablespaces.

Ultimately this mechanism was an optimization for slow drives:  if you can sequentially write all the changes into a log, it will be faster to do on the fly as transactions come in than trying to randomly write the changes across the tablespaces.  Sequential IO trumps Random IO.

However, even today in our modern flash storage world where random IO is significantly less expensive (from a latency perspective, not dollars), this is still an optimization because the longer we delay updating the tablespace, the more IOPs we can potentially conserve, condense, merge, etc.  This is because:

  • The same row may be written multiple times before the page is flushed
  • Multiple rows within the same page can be written before the page is flushed

Innodb Log Checkpointing

So, first of all, what can we see about Innodb log checkpointing and what does it tell us?

mysql> SHOW ENGINE INNODB STATUS\G
---
LOG
---
Log sequence number 9682004056
Log flushed up to   9682004056
Last checkpoint at  9682002296

This shows us the virtual head of our log (Log sequence Number), the last place the log was flushed to disk (Log flushed up to), and our last Checkpoint.  The LSN grows forever, while the actual locations inside the transaction logs are reused in a circular fashion.    Based on these numbers, we can determine how many bytes back in the transaction log our oldest uncheckpointed transaction is by subtracting our ‘Log sequence number’ from the ‘Last checkpoint at’ value.  More on what a Checkpoint is in a minute.    If you use Percona server, it does the math for you by including some more output:

---
LOG
---
Log sequence number 9682004056
Log flushed up to   9682004056
Last checkpoint at  9682002296
Max checkpoint age    108005254
Checkpoint age target 104630090
Modified age          1760
Checkpoint age        1760

Probably most interesting here is the Checkpoint age, which is the subtraction I described above.  I think of the Max checkpoint age as roughly the furthest back Innodb will allow us to go in the transaction logs; our Checkpoint age cannot exceed this without blocking client operations in Innodb to flush dirty buffers.  Max checkpoint age appears to be approximately 80% of the total number of bytes in all the transaction logs, but I’m unsure if that’s always the case.

Remember our transaction logs are circular, and the checkpoint age represents how far back the oldest unflushed transaction is in the log.  We cannot overwrite that without potentially losing data on a crash, so Innodb does not permit such an operation and will block incoming writes until the space is available to continue (safely) writing in the log.

Dirty Buffer Pool Pages

On the other side, we have dirty buffers.  These two numbers are relevant from the BUFFER POOL AND MEMORY section of SHOW ENGINE INNODB STATUS:

Database pages          65530
...
Modified db pages       3

So we have 3 pages that have modified data in them, and that (in this case) is a very small percentage of the total buffer pool.  A page in Innodb contains rows, indexes, etc., while a transaction may modify 1 or millions of rows.  Also realize that a single modified page in the buffer pool may contain modified data from multiple transactions in the transaction log.

As I said before, dirty pages are flushed to disk in the background.  The order in which they are flushed really has little to nothing to do with the transaction they are associated with, nor with the position associated with their modification in the transaction log.    The effect of this is that as the thread managing the dirty page flushing goes about its business, it is not necessarily flushing to optimize the Checkpoint age, it is flushing to try to optimize IO and to obey the LRU in the buffer pool.

Since buffers can and will be flushed out of order, it may be the case that there are a lot of transactions in the transaction log that are fully flushed to disk (i.e., all the pages associated with said transaction are clean), but there still could be  older transactions that are not flushed.  This, in essence, is what fuzzy checkpointing is.

The checkpoint process is really a logical operation.  It occasionally  (as chunks of dirty pages get flushed) has a look through the dirty pages in the buffer pool to find the one with the oldest LSN, and that’s the Checkpoint.  Everything older must be fully flushed.

The main reason this is important is if the Checkpoint Age is not a factor in dirty buffer flushing, it can get too big and cause stalls in client operations:  the algorithm that decides which dirty pages to flush does not optimize for this [well] and sometimes it is not good enough on its own.

So, how can we optimize here?  The short of it is: make innodb flush more dirty pages.  However, I can’t help but wonder if some tweaks could be made to the page flushing algorithm to be more effective there in choosing older dirty pages.   It is clear how that algorithm works without reading the source code.

There are a lot of ways to tune this, here is a list of the most signficant, roughly ordered from oldest to newest, and simultaneously listed from least effective to most effective:

  • innodb_max_dirty_pages_pct:  This attempts to keep the percentage of dirty pages under control, and before the Innodb plugin this was really the only way to tune dirty buffer flushing.  However, I have seen servers with 3% dirty buffers and they are hitting their max checkpoint age.  The way this increases dirty buffer flushing also doesn’t scale well on high io subsystems, it effectively just doubles the dirty buffer flushing per second when the % dirty pages exceeds this amount.
  • innodb_io_capacity: This setting, in spite of all our grand hopes that it would allow Innodb to make better use of our IO in all operations, simply controls the amount of dirty page flushing per second (and other background tasks like read-ahead).  Make this bigger, you flush more per second.  This does not adapt, it simply does that many iops every second if there are dirty buffers to flush.  It will effectively eliminate any optimization of IO consolidation if you have a low enough write workload (that is, dirty pages get flushed almost immediately, we might be better off without a transaction log in this case).  It also can quickly starve data reads and writes to the transaction log if you set this too high.
  • innodb_write_io_threads: Controls how many threads will have writes in progress to the disk.   I’m not sure why this is still useful if you can use Linux native AIO.  These can also be rendered useless by filesystems that don’t allow parallel writing to the same file by more than one thread (particularly if you have relatively few tables and/or use the global tablespaces) *cough ext3 cough*.
  • innodb_adaptive_flushing: An Innodb plugin/5.5 setting that tries to be smarter about flushing more aggressively based on the number of dirty pages and the rate of transaction log growth.
  • innodb_adaptive_flushing_method:  (Percona Server only)  This adds a few new different algorithms, but the more effective ones adjust the amount of dirty page flushing using a formula that considers the Checkpoint age and the Checkpoint age target (something you can set in Percona Server, otherwise it is effectively just a hair under the Max Checkpoint age).  The two main methods here would be ‘estimate’ (good default) and ‘keep_average’ designed for SSD type storage.  Running Percona Server and using this method is my go-to setting for managing Checkpoint age.
To be clear, the best we have today (IMO) is using innodb_adaptive_flushing_method in Percona server.
In any case, if you run any kind of production MySQL server, you should be monitoring your Checkpoint age and your Innodb dirty pages and try to see the relationship with those values and your write operations on disk.  The additional controls in 5.5 and Percona server are excellent reasons to consider an upgrade.
Feb
17
2012
--

Announcing MySQL Monitoring Plugins from Percona

We’ve released a new set of monitoring plugins for MySQL servers and related software. With these plugins, you can set up world-class graphing and monitoring for your MySQL servers, using your own on-premises Cacti and Nagios software. The Cacti plugins are derived from an existing set of templates we’ve been using for several years, but the Nagios check plugins are brand new. They are informed by the research we did into the causes and preventions of MySQL downtime.

Like all Percona software, the plugins are open-source and free, licensed under the GNU GPL. The source code and issue tracker are hosted at Launchpad. The 0.9.0 release is ready to download now, and the reference manual is also online. In the next release we will integrate this into our package management, so you can install through YUM and APT repositories.

The new monitoring plugins add first-rate support for MySQL to popular enterprise opensource monitoring systems, and that’s why we’ve added them to the list of software included in our support contracts. If you’re an existing customer of Percona’s MySQL Support services, your contract automatically covers these new plugins, too; you don’t need a new contract for that. Under the support contract, you’ll get help installing, configuring, troubleshooting, and administering your monitoring plugins, as well as being entitled to bug fixes. And as always, we’re also available to extend the plugins to support your environment or broaden the range of systems and software that they can monitor.

Please use the Launchpad bug tracker to report issues. Happy monitoring!

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com