Feb
22
2012
--

Benchmarks of Intel 320 SSD 600GB

I have a chance to test a system with Intel 320 SSD drives (NewRelic provided me with an access to the server), and compare performance with SAS hard drives.

System specification

  • Dell PowerEdge R610
  • Memory: 48GB
  • CPU: Intel(R) Xeon(R) CPU X5650
  • RAID controller: Perc H800
  • RAID configuration: RAID 5 over 11 disks + 1 hot spare. RAID 5 is chosen for space purposes. In this configuration using 600GB disk, we can get 5.5T of useful space
  • Intel drives: Intel 320 SSD 600GB
  • HDD drives: Seagate Cheetah 15K 600GB 16MB Cache SAS
  • Filesystem: XFS, mkfs.xfs -s size=4096, mount -o nobarrier

Benchmark:
For the benchmark I took a sysbench uniform oltp rw workload. 256 tables, 50mil rows each, which gives in total 3T of data.
To vary a ratio memory/data I will vary an amount of tables from 256 (3TB) to 32 (375GB).
As a backend database I use Percona Server 5.5.19.

I should mention that on these datasizes, sysbench workload is pretty nasty, MySQL will mostly reads and writes pages from buffer pool (replacing pages in buffer pool). This however allows us to see the best possible scenario for SSD running under MySQL, the final result will show the best possible gain.
I do measurements every 10 sec to see stability of results.

Graphical result:

Tabular:

Tables HDD SDD Ratio
32   1226 1644 1.340946
64    140  571 4.078571
96    101  506 5.009901
128    89  486 5.460674
192    79  484 6.126582
256    75  495 6.600000

As you can see, on the big datasizes we have 5-6x improvement. However on 32 tables (375GB of data), the result became unstable.

There is a graph with time series with 10 sec measurements.

It looks like we are having symptoms of the flushing problem. This is to investigate later.

The scripts and raw results are on Benchmarks Launchpad.


Feb
22
2012
--

How to Monitor MySQL with Percona’s Nagios Plugins

In this post, I’ll cover the new MySQL monitoring plugins we created for Nagios, and explain their features and intended purpose.

I want to add a little context. What problem were we trying to solve with these plugins? Why yet another set of MySQL monitoring plugins?

The typical problem with Nagios monitoring (and indeed with most other monitoring systems) is the alert spam. I’ve spoken to a lot of users with monitoring systems over the years, and everyone says that they get a lot of false alarms, and that they have email filters set up for alerts. Recently I spoke to one person who gets about thirty thousand alerts per week (!!!) in his email. I’m sure you can see that this makes the monitoring system much less effective. The usual cause of such noisy monitoring is when you alert on things that don’t really indicate problems.

To avoid this problem as much as possible, I suggest the following:

  1. Alert only on things that reliably indicate a real problem. Do not alert on a threshold that merely indicates something is abnormal; abnormal things happen constantly. It’s useful to spend time investigating abnormalities, but in general, you should try to make sure that doesn’t creep into your fault-detection alerts.
  2. Alert only on things that are actionable. If you get an alert and you can’t do anything about it, that’s a certain recipe for a /dev/null email filter.
  3. Avoid ratios and thresholds as much as possible, because they are rarely actionable or reliable indicators of problems, and it’s difficult or impossible to find the right threshold for all situations. This suggestion is a consequence of the previous two, but it’s worth calling out separately in my opinion. I’ll highlight a couple of threshold-based alerts later in this blog post, and explain the need to be careful with them.

Although we’re not finished with the plugins, we have a usable starting point, and we’ll build on it as time passes. The plugins we created are based on our analysis of several hundred customer downtime incidents, and their causes and preventions. You can read more about that research in our white papers, here and here.

If you read those white papers, you’ll see a pretty extensive list of suggestions for what to monitor in MySQL. Those are only suggestions; any given deployment might want to monitor only some of them, as appropriate, and might need to include others as well. Here’s how you can use our new monitoring plugins to check for some of the conditions that caused downtime for our customers:

  • Failed or full LVM snapshots. LVM backup strategies can fail to delete snapshots after finishing, sometimes due to the backup script failing before completion. Additionally, LVM snapshots that don’t have enough copy-on-write space can fail before the backup completes. The pmp-check-lvm-snapshots plugin will alert when a snapshot is failed or overly full.
  • Too many deadlocks. Deadlocks (and lock wait timeouts, covered in another plugin) are frequent causes of serious errors in applications because the applications often don’t handle deadlocks correctly, and in many cases also don’t log the resulting errors. The best place to fix these problems is in the application, but the DBA really needs to be monitoring it as well, because an application error can creep in easily and be deployed, and you will not find it anywhere but in production. The pmp-check-mysql-deadlocks plugin can help you detect such problems, in conjunction with Percona Toolkit’s pt-deadlock-logger tool.
  • Open but deleted filehandles. It’s quite easy for a bad logrotate script (such as the one that ships by default with many server installations) or a system administration mistake to delete one of MySQL’s log or data files from disk. The database server will continue to function, however, because the file isn’t really deleted. The MySQL server keeps an open filehandle to the file, and it won’t be deleted until the server restarts, so the problem remains undetected while it becomes more and more severe. This can cause two serious types of problems: the disk can fill up with no apparent cause or solution, or you can restart the server and lose your data permanently. The pmp-check-mysql-deleted-files plugin will alert when files have been deleted but remain open via a filehandle.
  • Wrong filesystem privileges. It’s very easy for a file to creep into MySQL’s data directory with wrong permissions, without causing problems until a later date. For example, if a database isn’t owned by MySQL, you won’t have any problems until you try to create or drop a table, or the database itself. Likewise, if a system administrator leaves a temporary file in the database directory and then you try to drop the database, it will fail. These things tend to happen at inconvenient times. The pmp-check-mysql-file-privs plugin checks whether MySQL owns every file and directory within its data directory.
  • InnoDB is getting stuck, has long-running transactions, or other problems. The pmp-check-mysql-innodb plugin checks for several indicators of severe internal problems within InnoDB, such as contention that crosses a tipping point and cripples the server. Sometimes such conditions are like credit card debt: easy to start, impossible to dig out afterwards. Another very common cause of serious incidents is application misbehavior, such as opening a transaction but never committing it, which creates a lot of bloat or locking in the database server. This plugin can alert on that and a couple other types of common problems, too.
  • Missing PID file. A variety of problems can delete MySQL’s PID file, such as a buggy init script that attempts to start the server when it’s already running, or executing the server by hand instead of through the init script. Overly zealous cron jobs to delete old files in /tmp can do the same thing. If the PID file is missing, then many init scripts will behave very badly. Sample problems include starting two instances of MySQL on the same files in the data directory, and refusing to shut down MySQL so that it is killed forcibly (i.e. crashed) instead of gracefully shutting down for a system reboot. The pmp-check-mysql-pidfile plugin verifies that MySQL’s PID file exists.
  • Evidence of contention in the processlist. The pmp-check-mysql-processlist plugin checks SHOW FULL PROCESSLIST for patterns that indicate similar problems to those discussed within InnoDB, but it can find evidence of problems in the server as well as the storage engine. For example, a pile-up of connections all trying to open or close tables can indicate LOCK_open contention.
  • Replication is delayed or stopped. Our plugins are structured to monitor as few things as possible, but as many as needed. Replication is one thing that needs to be monitored in two dimensions, which shouldn’t be combined: delay (Seconds_behind_master or the delay indicated by the pt-heartbeat tool, if you choose), and actual failure. The pmp-check-mysql-replication-delay plugin checks for excessive delay, and pmp-check-mysql-replication-running checks that replication is functioning correctly.
  • Replicas have different data than their master. In addition to delay or failure, replication can fail silently and data can diverge or drift on a replica. The pt-table-checksum tool from Percona Toolkit can generate table checksums, and the pmp-check-pt-table-checksum plugin can alert when those checksums indicate that the replica is not logically identical to its master.
  • The server is allocating too much memory. A runaway server process, or poorly optimized queries, can consume too much memory and cause the server to swap until it is unresponsive and has to be power cycled. In addition, other processes can be runaways, too, such as a poorly written script or daemon; it is not unheard of for a monitoring agent or RAID utility software to do this. The pmp-check-unix-memory plugin checks for a single process consuming too much memory and/or for too little free memory in general. This plugin requires careful configuration because it uses a threshold for which there is no good one-size-fits-all value. This plugin, perhaps more than any other, should only be used judiciously.
  • The server has been restarted recently. It is surprising how often a server crashes, perhaps even cycling through crashes, and is not detected until sometime later when the problem has grown much more serious. You can use the pmp-check-mysql-status plugin to alert when the server’s Uptime variable is too small.
  • The server is approaching max_connections. Overflowing the max_connections setting is a bit like a lack of disk space: it is possible for it to be slowly approached over time, and then to begin causing problems when connections are rejected. Similarly to deadlocks and timeouts, applications often don’t handle or log this error appropriately. The pmp-check-mysql-status plugin can be configured to alert when Threads_connected approaches max_connections too closely, which can give the system administrator time to address the situation before it becomes a problem. This check requires careful configuration, or it too can become a noise generator. However, if your application environment makes it difficult to handle max_connections problems (the use of fixed-size connection pooling, for example, which can sometimes require application downtime to resolve), then it might be worth considering to let you deal with the problem when it’s convenient instead of when it’s urgent.

In addition to the checks in that list, there are several others that can be performed with existing plugins, so there was no need to write new ones. For example, the plugins that ship with Nagios are suitable for checking that the server is alive and responds to a query.

The pmp-check-mysql-status is also very flexible and can be used to trigger an alert on almost anything from SHOW VARIABLES and SHOW STATUS that you want. If you want to monitor ratios and rates, for example, you can certainly do so; it also supports an incremental mode of operation, so you can (for example) get the incremental increase in a variable such as Queries and alert on a queries-per-second threshold. (I don’t suggest doing this; I’m only listing an example of what you can do. The documentation has several other examples.)

In future releases we will include more plugins to monitor more types of problems, but the current plugins already cover some of the most common and severe conditions that we’ve observed to cause actual production downtime. Please download and try out the plugins, and help us improve them. If you are a customer and you need help configuring any of the plugins, we are happy to assist. In addition, please ask our sales staff to include a free monitoring setup in a new support contract. I believe they will be happy to negotiate this with you.

Feb
22
2012
--

A week of symfony #268 (13->19 February 2012)

This week, the Symfony Live Paris 2012 conference was announced. Early bird tickets sold out in less than 24 hours and call for papers is open until March 30th. Meanwhile, Symfony 2.1 added hinclude support in the core.

Development mailing list

symfony 1 development highlights

Changelog:

  • r33338:
    [1.4]
    added a way to understand why a Doctrine migration fails

Symfony2 development highlights

Master branch:

  • a395873:
    [FrameworkBundle] add auto_start pass to the session storage options
  • dd5d72a:
    added Propel to the vendors.php script
  • d9ce982:
    [Propel] added tests for CollectionToArrayTransformer
  • 97cbf90:
    [Propel] added tests for the PropelDataCollector
  • 8935dec:
    [HttpFoundation] added support for SVG mime type
  • 57968f8:
    [HttpKernel] fixed call to Memcached::set() in MemcachedProfilerStorage
  • 4a0057f:
    [FrameworkBundle] added support for HInclude
  • 9f05d4a:
    [Console] added suggest on bad command name
  • b95284e:
    [Profiler] fixed memcache(d)
  • fb27de0:
    [Config] code and phpdoc cleanup
  • ed028d5:
    [WebProfilerBundle] made is_ajax available to the view when rendering panels
  • 6745b28:
    [Config] throw exceptions on invalid definition

2.0.x branch:

  • beb4fc0:
    [Locale] StubIntlDateFormatter::parse was throwing exception instead of returning Boolean false like intl implementation
  • ec7fb0b:
    [Routing] added a proper exception when a route pattern references the same variable more than once
  • 2434552:
    [Translation] fixed fallback location if location is longer than three characters

Repository summary: 3,953 watchers (#1 in PHP, #28 overall) and 1028 forks (#1 in PHP, #11 overall).

New plugins

  • jmHeadJs: includes HeadJS in all pages. Works with Apostrophe 1.5.

Updated plugins

  • cpDataTables:

    • upgraded to release 1.9.0
  • apostrophe:

    • modified aControls::aMultipleSelect so that random ids of select elements could be passed as an option
    • search services support storage of extra data with search documents. Pull out everything so events show date ranges in searches
    • nulling out a date field is also worthy of an update event
    • single selects and single uploads that don’t involve cropping trigger an immediate completion of the select
    • file slot, single image slot and MP3 slot immediately go to media library when first added
    • injectActualUrlIntoHref broken out separately so it can be called from elsewhere to insert the URL of the current page as a parameter of any URL being built
    • the new app_aMedia_reuse_duplicates flag prevents duplicate media from being stored
    • added basic support for multiple video selection
  • apostropheBlog:

    • the end date now automatically updates to match the start date if it is currently earlier than the start date

They talked about us


Be trained by Symfony experts
2012-03-26 Paris
2012-03-26 Paris
2012-04-02 Paris

Written by in: Zend Developer |
Feb
21
2012
--

DBD::mysql 4.014 breaks pt-table-checksum 2.0

DBD::mysql 4.014 breaks pt-table-checksum 2.0.  The cause is unknown, but the effect is a lot of errors like:

DBD::mysql::st execute failed: called with 2 bind variables when 6 are needed [for Statement "..." with ParamValues: ...] at ./pt-table-checksum line 7216.

The fix is simple: upgrade (or even downgrade) DBD::mysql to any version except 4.014. To see which version of DBD::mysql a system has, execute:

perl -MDBD::mysql -e 'print $DBD::mysql::VERSION, "\n";'

This bug may affect other Percona Toolkit tools, but currently pt-table-checksum 2.0 is the only victim. This bug does not affect pt-table-checksum 1.0, and it cannot be worked around in pt-table-checksum 2.0 because the bug in in DBD::mysql.

This bug affects pt-table-checksum 2.0 but not 1.0 because the newer version uses prepared statements with parameter values, whereas the older version does not. It seems, although I have not verified this, that DBD::mysql 4.014 has some sort of caching mechanism which causes it to use the wrong prepared statement.

Feb
20
2012
--

How to recover deleted rows from an InnoDB Tablespace

In my previous post I explained how it could be possible to recover, on some specific cases, a single table from a full backup in order to save time and make the recovery process more straightforward. Now the scenario is worse because we don’t have a backup or the backup restore process doesn’t work. How can I recover deleted rows?

We’re going to follow the same example as in my previous post so we need to delete the records of the employee 10008 from the table “salaries”. After the “accidental” deletion of rows you should stop MySQL, take a copy of the salaries.ibd and start it again. Later, we’ll extract those deleted rows from the ibd file and import them into the database. The time between the deletion of rows and the database stop is crucial. If pages are reused you can’t recover the data.

The tool that we’re going to use to achieve this objetive is Percona Data Recovery Tool for InnoDB. This is the tool that we use in our data recovery service and it’s open source.

I’m going to explain the full process in four different steps to make it clearer:

1- Extract all the InnoDB pages from the tablespace:

First we need to download Percona Data Recovery Tool and compile all the tools using the “make” command. In this example I’m going to install the tools in /root/recovery-tool folder and the data like tablespaces and recovered rows in /root/recovery-tool/data.

After the compile process we need to copy the salaries.ibd tablespace to the recovery-tool’s data directory. In order to extract all the pages we’ll use the page_parser tool. This tool will find and extract all the pages of the tablespace to an output directory. We only need to specify the row format (-5) and where is our tablespace located (-f)

The row format can be -4 (REDUNDANT) or -5 (COMPACT). From 5.0.3 the default format is COMPACT. More information about row format on the following link:

http://dev.mysql.com/doc/refman/5.5/en/innodb-physical-record.html

You can also get the table row format from the Information Schema:

mysql (information_schema) > SELECT ROW_FORMAT from TABLES WHERE TABLE_SCHEMA='employees' AND TABLE_NAME='salaries';
+------------+
| ROW_FORMAT |
+------------+
| Compact |
+------------+

~/recovery-tool# ./page_parser -5 -f data/salaries.ibd
Opening file: data/salaries.ibd:
[...]
71.43% done. 2012-02-14 13:10:08 ETA(in 00:00 hours). Processing speed: 104857600 B/sec

All pages are stored in a single directory with some subdirectories inside, one for every single index in the table:

~/recovery-tool# ls pages-1329221407/FIL_PAGE_INDEX/
0-26 0-27

In this case, there are two indexes with the IDs 0-26 and 0-27. InnoDB has Clustered Primary Key, that is, the data is organized along with the primary key. Therefore if we want to extract the rows data we need to identify which of those two indexes is the Primary Key. This is our next step.

2- Identifying the Primary Key

There are different methods to find the correct index and here I’m going to explain three of them:

A) INNODB_SYS_INDEXES

Percona Server has some extra tables in INFORMATION_SCHEMA that can help us to find the different indexes and types.

mysql (information_schema) > select i.INDEX_ID, i.NAME FROM INNODB_SYS_INDEXES as i INNER JOIN INNODB_SYS_TABLES as t USING(TABLE_ID) WHERE t.NAME='salaries';
+----------+---------+
| INDEX_ID | NAME |
+----------+---------+
| 26 | PRIMARY |
| 27 | emp_no |
+----------+---------+

B) InnoDB Table Monitor

The index information can also be taken directly from MySQL using the InnoDB Tablespace Monitor. This monitor will write all the information related with tables and indexes (with their IDs) to the error log.

mysql (employees) > CREATE TABLE innodb_table_monitor (id int) ENGINE=InnoDB;

TABLE: name employees/salaries, id 18, flags 1, columns 7, indexes 2, appr.rows 2844513
[...]
INDEX: name PRIMARY, id 26, fields 2/6, uniq 2, type 3
root page 3, appr.key vals 2844513, leaf pages 6078, size pages 6120
FIELDS: emp_no from_date DB_TRX_ID DB_ROLL_PTR salary to_date
INDEX: name emp_no, id 27, fields 1/2, uniq 2, type 0
root page 4, appr.key vals 306195, leaf pages 2189, size pages 2212
FIELDS: emp_no from_date
[...]

The second method has the same result, 0-26 is our primary key. After identifying the Primary Key don’t forget to remove the innodb_table_monitor.

C) Check the size on disk of every indes

This is very dependent of the table schema, but normally the primary key will be larger on disk because it stores also the row itself.

~/recovery-tool/pages-1329221407/FIL_PAGE_INDEX# du -hs 0-26/
96M 0-26/
~/recovery-tool/pages-1329221407/FIL_PAGE_INDEX# du -hs 0-27/
35M 0-27/

In our examples 0-26 seems to be the Primary Key.

3- Extract the rows

We know in which index is the data so the next step is clear, extract the rows from it. To accomplish this task we’ll use constraint_parser command. In order to use it the tool needs to know the table schema structure, that is, column types, names and attributes. This information needs to be available on the header file recovery-tools/include/table_defs.h so it will be necessary to recompile the tool.

To help us in the task of converting the schema definition to a C header file there is another tool with the name create_defs.pl. This one will connect to the database in order to examine the table and create the table_defs.h content.

~/recovery-tool# ./create_defs.pl --host 127.0.0.1 --port 5520 --user root --password msandbox --db employees --table salaries > include/table_defs.h

More information about the table_defs.h format in the following link:

http://www.percona.com/docs/wiki/innodb-data-recovery-tool:mysql-data-recovery:generating_a_table_definition

With the table definition on table_defs.h it’s the moment to compile again all the tools with “make” command. After the compile process we can use constraints_parser to recover the rows in a human readable format.

~/recovery-tool# ./constraints_parser -5 -D -f pages-1329221407/FIL_PAGE_INDEX/0-26/ > data/salaries.recovery

With -D option we are asking to the constraints_parser to recover only deleted pages. -5 and -f are the same options that we’ve used before with page_parser.

In the salaries.recovery you can find lot of deleted rows, not only rows deleted accidentally. You should manually find the data you need to recover and save it in another file. This is the output of our example:

~/data-recovery# cat data/salaries.recovery
salaries 10008 "1998-03-11" 46671 "1999-03-11"
salaries 10008 "1999-03-11" 48584 "2000-03-10"
salaries 10008 "2000-03-10" 52668 "2000-07-31"

4- Import the rows

With the data in our hands the last step is to import it in our database:

mysql (employees) > LOAD DATA INFILE '/root/recovery-tool/data/salaries.recovery' REPLACE INTO TABLE `salaries` FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '"' LINES STARTING BY 'salaries\t' (emp_no, from_date, salary, to_date);
Query OK, 3 rows affected (0.01 sec)
Records: 3 Deleted: 0 Skipped: 0 Warnings: 0

mysql (employees) > select * from salaries where emp_no=10008;
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 10008 | 46671 | 1998-03-11 | 1999-03-11 |
| 10008 | 48584 | 1999-03-11 | 2000-03-10 |
| 10008 | 52668 | 2000-03-10 | 2000-07-31 |
+--------+--------+------------+------------+

Data recovered! 10008 has recovered his salary again :)

Conclusion

With InnoDB the deleted rows are not lost. You can recover them from the original tablespace and also from an ibd file if you have a binary backup. Just use constraint_parser without the -D option (deleted) and you will recover all the data that exists inside that tablespace

Feb
20
2012
--

How to convert MySQL’s SHOW PROFILES into a real profile

SHOW PROFILES shows how much time MySQL spends in various phases of query execution, but it isn’t a full-featured profile. By that, I mean that it doesn’t show similar phases aggregated together, doesn’t sort them by worst-first, and doesn’t show the relative amount of time consumed.

I’ll profile the “nicer_but_slower_film_list” included with the Sakila sample database to demonstrate:

mysql> SET profiling=1;
mysql> pager cat > /dev/null
mysql> SELECT * FROM nicer_but_slower_film_list;
997 rows in set (0.18 sec)

The query consumed 0.18 seconds. Where did the time go?

mysql> SHOW PROFILE FOR QUERY 1;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000032 |
| checking permissions | 0.000005 |
... snip ...
| init                 | 0.000021 |
| optimizing           | 0.000003 |
| statistics           | 0.000006 |
| cleaning up          | 0.000003 |
+----------------------+----------+
35 rows in set (0.00 sec)

It’s pretty hard to figure out what consumed the most time, because it’s sorted in execution order, not order of time consumption. Here is one query that can solve this partially:


SET @query_id := 1;
SELECT STATE, SUM(DURATION) AS Total_R,
       ROUND(
          100 * SUM(DURATION) /
             (SELECT SUM(DURATION)
              FROM INFORMATION_SCHEMA.PROFILING
              WHERE QUERY_ID = @query_id
          ), 2) AS Pct_R,
       COUNT(*) AS Calls,
       SUM(DURATION) / COUNT(*) AS "R/Call"
    FROM INFORMATION_SCHEMA.PROFILING
    WHERE QUERY_ID = @query_id
    GROUP BY STATE
    ORDER BY Total_R DESC; 

+----------------------+----------+-------+-------+--------------+
| STATE                | Total_R  | Pct_R | Calls | R/Call       |
+----------------------+----------+-------+-------+--------------+
| removing tmp table   | 0.095135 | 55.10 |     3 | 0.0317116667 |
| Copying to tmp table | 0.046175 | 26.74 |     1 | 0.0461750000 |
| Sending data         | 0.018478 | 10.70 |     3 | 0.0061593333 |
| Sorting result       | 0.011090 |  6.42 |     1 | 0.0110900000 |
| checking permissions | 0.000802 |  0.46 |     6 | 0.0001336667 |
| Creating tmp table   | 0.000548 |  0.32 |     1 | 0.0005480000 |
| Opening tables       | 0.000196 |  0.11 |     1 | 0.0001960000 |
| statistics           | 0.000071 |  0.04 |     2 | 0.0000355000 |
| starting             | 0.000032 |  0.02 |     1 | 0.0000320000 |
| freeing items        | 0.000027 |  0.02 |     2 | 0.0000135000 |
| preparing            | 0.000025 |  0.01 |     2 | 0.0000125000 |
| init                 | 0.000021 |  0.01 |     1 | 0.0000210000 |
| optimizing           | 0.000020 |  0.01 |     2 | 0.0000100000 |
| closing tables       | 0.000014 |  0.01 |     2 | 0.0000070000 |
| System lock          | 0.000010 |  0.01 |     1 | 0.0000100000 |
| query end            | 0.000003 |  0.00 |     1 | 0.0000030000 |
| cleaning up          | 0.000003 |  0.00 |     1 | 0.0000030000 |
| executing            | 0.000002 |  0.00 |     2 | 0.0000010000 |
| end                  | 0.000002 |  0.00 |     1 | 0.0000020000 |
| logging slow query   | 0.000001 |  0.00 |     1 | 0.0000010000 |
+----------------------+----------+-------+-------+--------------+

Much nicer. Now you can see that over 3/4ths of the time was spent working with temporary tables.

But there’s something still missing: it doesn’t show lost time (the amount of time not accounted by the profiling). That is, the SUM(DURATION) isn’t the same as the total query duration. Alas, there is no query-level table that would allow me to subtract the SUM(DURATION) from the query’s real response time. If there were, I could add in a UNION to inject another row for “lost time” and show the portion of response time that wasn’t captured in the profile.

The above query is something I developed for High Performance MySQL Third Edition, by the way. The book should be available in a matter of weeks.

Feb
20
2012
--

MySQL Meetups in North Carolina this week

A quick reminder. I’m speaking on two MySQL meetups in North Carolina this week. Tuesday,Feb 21 on Raleigh MySQL/PHP Meetup and when Wednesday, Feb 22 on Charlotte Queen City PHP Note the last meetup date has changed, it was originally planned for February 23 but had to be rescheduled due to conflicts.

Feb
19
2012
--

Why don’t our new Nagios plugins use caching?

In response to the release of our new MySQL monitoring plugins on Friday, one commenter asked why the new Nagios plugins don’t use caching. It’s worth answering in a post rather than a comment, because there is an important principle that needs to be understood to monitor servers correctly. But first, some history.

When I wrote a set of high-quality Cacti templates for MySQL a few years ago (which are now replaced by the new project), making the Cacti templates use caching was important for two reasons:

  1. Performance. Cacti runs some of its polling processes serially, so if each graph has to reach out to the MySQL server and retrieve a bunch of data, the polling can take too long. I’ve seen cases where a Cacti server that’s graphing too many MySQL servers doesn’t finish the current iteration before the next one starts. Making sure the data isn’t retrieved from MySQL repeatedly cuts a lot of latency out of the process. In fact, I went further than that; I didn’t cache the data retrieved from MySQL, I cached the resulting metrics. This avoids having to do InnoDB parsing repeatedly, too. With this approach, Cacti is capable of monitoring many more servers.
  2. Consistency. Graphs are supposed to be generated at a single instant in time, but as I just said, they actually aren’t. Graphs that show slightly different data, or greatly different data in some cases I observed, are frustrating at best and wrong or misleading at worst. When you’re using these graphs for forensic or diagnostic purposes, this is very bad. Caching the results from the first call and reusing it for all subsequent polls avoids this problem. All of the graphs are generated from one and only one set of data retrieved from MySQL.

It’s worth noting that Cacti doesn’t do this itself, which I consider to be a Cacti design shortcoming. I did not like writing my own cache management code. Handling cached data correctly is one of the two hard things in programming (choosing proper variable names is the other one, of course).

Now, somewhere along the line “you should cache when you are monitoring a system for failures” became a tacit best practice. I don’t know why, but I can speculate that there might have been a misunderstanding — perhaps caching was understood to reduce load on MySQL, rather than reducing load on the Cacti server and speeding up the poller.

This is a very bad idea! I do NOT advocate caching for Nagios checks. Why? Exactly the inverse of the two reasons that graphing should be cached!

  1. Performance doesn’t matter. First, most Nagios checks are run once every five minutes, or once a minute at most. Running SHOW STATUS infrequently doesn’t add load to the server. Most real production installations I see actually have multiple people running innotop or what have you, once a second! Second, if your monitoring is to be at all useful, you should not monitor too many things inside of MySQL. If you take the typical mass-market monitoring program’s silly “alert on 99 different kinds of cache hit ratio thresholds” approach, you’ll soon have sysadmins with /dev/null email filters, which will make your monitoring system utterly useless. However, even if you have multiple checks running once a minute, it’s still a trivial amount of load.
  2. Consistency is just another word for staleness. Great, your monitoring system thinks that the server is fine, based on a three-minute-old file — what good is that? You should be monitoring the server, as of NOW, rather than monitoring the contents of some file as of several minutes ago.

In conclusion, you should cache for historical metrics monitoring, but not for fault detection monitoring. That’s why the new Nagios plugins don’t have any caching functionality.

Feb
18
2012
0

Fake Yzmir Ramirez Facebook Account

http://www.facebook.com/yzmir.ramirez is not me. Looks like its another fake account.

Written by in: facebook | Tags: , ,
Feb
18
2012
--

MariaDB 5.3.4 benchmarks

MariaDB 5.3 has reached the release candidate milestone, and the 5.3 version promises a lot of new features and optimization (i.e in optimizer http://kb.askmonty.org/en/what-is-mariadb-53#query-optimizer). No surprise I wanted to check how all improvements affect general performance.

So I why don’t we run old good sysbench benchmark.

For the benchmark I took:

  • HP ProLiant DL380 G6 box
  • sysbench multitables oltp rw workload, 16 tables, 500mil rows each, total datasize about 30GB
  • working threads from 1 to 256
  • Versions: MariaDB 5.3.4, MySQL 5.5.20
  • Data is stored on RAID10 HDD partition
  • Like in all my recent benchmarks, I make throughput measurements each 10 sec, so we can see the stability of the throughout

The raw results, configuration and scripts are available on our Benchmarks Launchpad

The graphical results:

Throughput (more is better)

Threads MariaDB 5.3.4 MySQL 5.5.20  Ratio
1        252           271          0.9298893
2        412           588          0.7006803
4        801          1097          0.7301732
8       1709          2205          0.7750567
16      3197          4076          0.7843474
32      3303          4166          0.7928469
64      3336          4150          0.8038554
128     3800          4170          0.9112710
256     3710          4131          0.8980876

I was surprised to see that MariaDB shows 20-30% worse throughput.
It seems many changes resulted to performance hit in general. I wonder whether MariaDB team runs performance regression benchmarks, and if they do, why do we see such performance decline.


Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com