Sep
14
2009
--

SystemTap – DTrace for Linux ?

Since DTrace was released for Solaris I am missing it on Linux systems… It can’t be included in Linux by the same reason why ZFS can’t be – it’s licensing issue. Both ZFS and DTrace are under CDDL, which is incompatible with GPL. So you can see DTrace and ZFS on Solaris, FreeBSD, MacOS, but not on Linux.

However I follow the project SystemTap for couple of years (it was started in 2005), which is supposed to provide similar to DTrace functionality.

Why I am interested in this tool, because there is no simple way under Linux to profile not CPU-bound load (for CPU-bound there is OProfile, see for example
http://mysqlinsights.blogspot.com/2009/08/oprofile-for-io-bound-apps.html). I.e. for IO-bound or for mutex contention problems OProfile is not that useful.

SystemTap is included in RedHat 5 releases, but I was not able to get it running even in CentOS 5.3 (it crashed and hung every so often). Latest updated RedHat 5.4 promised some more fixes to SystemTap, so I decided to give it more try as soon as I got RedHat 5.4 on hands.

Surprising, but now it runs much more stable. I was able to get profiling of kernel and system calls.
Here is simple script to show IO activity per disk per process (well, it is similar to iotop, but iotop is not available in RedHat / CentOS)

with output like this

CODE:

  1. Mon Sep 14 05:22:14 2009 , Average:20353Kb/sec, Read:    4337Kb, Write:  97428Kb
  2.  
  3.      UID      PID     PPID                       CMD   DEVICE    T        BYTES
  4.       27     3701     3651                    mysqld     dm-0    W     99766272
  5.       27     3701     3651                    mysqld     dm-0    R      4440064
  6.        0     2324     2296           hald-addon-stor     dm-0    R         1242
  7.  
  8. Mon Sep 14 05:22:19 2009 , Average:21756Kb/sec, Read:    4263Kb, Write: 104521Kb
  9.  
  10.      UID      PID     PPID                       CMD   DEVICE    T        BYTES
  11.       27     3701     3651                    mysqld     dm-0    W    107029504
  12.       27     3701     3651                    mysqld     dm-0    R      4358144
  13.        0     2883     2879           pam_timestamp_c     dm-0    R         6528
  14.        0     2324     2296           hald-addon-stor     dm-0    R          828

This example maybe is simple, but the point is that there is rich scripting language with tons
of probes you can intersect ( kernel functions, FS drivers functions, any other drives and modules)

What else I see very useful in SystemTap it can work in userspace. That is you can use it to profile your and any application that has -debuginfo packages ( all -debuginfo for standard RedHat RPMS you can download from RedHat FTP), but basically it is info you get compiling with gcc -g.

Well, there seems another war story going on. To profile userspace application with SystemTap your kernel should be patches with uprobes patch, which fortunately is included in RedHat based kernels, but not included in vanilla kernel yet. So I am not sure if you can get userspace profiling running in another distributives.

There is quite simple script that I tried to hack around MySQL ®

CODE:

  1. probe process(“/usr/libexec/mysqld”).function(“*innobase*”).
  2. {
  3. printf(“s(%s)\n, probefunc(), $$parms)
  4. }

with output which I get running simple SELECT against InnoDB table:

CODE:

  1. stap -v lsprob.stp                                                                                           
  2. Pass 1: parsed user script and 52 library script(s) in 240usr/10sys/261real ms.
  3. Pass 2: analyzed script: 107 probe(s), 22 function(s), 1 embed(s), 0 global(s) in 540usr/20sys/554real ms.
  4. Pass 3: using cached /root/.systemtap/cache/4f/stap_4f8b8738f58ff78e294c62765ac83d91_36925.c
  5. Pass 4: using cached /root/.systemtap/cache/4f/stap_4f8b8738f58ff78e294c62765ac83d91_36925.ko
  6. Pass 5: starting run.
  7. innobase_register_trx_and_stmt(thd=? )
  8. innobase_register_stmt(thd=? )
  9. innobase_map_isolation_level(iso=? )
  10. innobase_release_stat_resources(trx=0x2aaaaaddb8b8 )
  11. convert_search_mode_to_innobase(find_flag=? )
  12. innodb_srv_conc_enter_innodb(trx=? )
  13. srv_conc_enter_innodb(trx=0x2aaaaaddb8b8 )
  14. innodb_srv_conc_exit_innodb(trx=? )
  15. srv_conc_exit_innodb(trx=0x2aaaaaddb8b8 )
  16. innobase_release_temporary_latches(thd=0x1a6aced0 )
  17. innobase_release_stat_resources(trx=? )
  18. srv_conc_force_exit_innodb(trx=0x2aaaaaddb8b8 )

Again, this case is maybe too simple, but basically you can intersect internal MySQL function and script (measure time, count of call, statistics) what you what. I did not figure out yet how to intersect C++ style function (i.e. ha_innobase::index_read), so there is area to investigate.

So I am going to play with it more and do some useful scripting to get profiling of MySQL.

And it seems SystemTap can re-use DTrace probes available in application, as you may know DTrace-probes were added into MySQL 5.4, so interesting how it works.

I should mention that there is second alternative of DTrace… It’s …. DTrace port. Looking on blog it seems one-man project and currently author is fighting with resolving userspace issues. I gave to this a try, but on my current RedHat 5.4 after several runs I got “Kernel panic”, so it’s enough for now.


Entry posted by Vadim |
2 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Sep
14
2009
--

Statistics of InnoDB tables and indexes available in xtrabackup

If you ever wondered how big is that or another index in InnoDB … you had to calculate it yourself by multiplying size of row (which I should add is harder in the case of a VARCHAR – since you need to estimate average length) on count of records. And it still would be quite inaccurate as secondary indexes tend to take more space. So we added more detailed index statistics into our xtrabackup utility. The thanks for this feature goes to a well known Social Network who sponsored the development.

We chose to put this into xtrabackup for a couple of reasons – the first is that running statistics on your backup database does not need to hurt production servers, and the second reason is that running statistic on a stopped database is more accurate than with online (although online is also supported, but you may have inexact results).

Let’s see how it works. I have one table with size 13Gb what was filled during about 2.5 years.
The table is:

CODE:

  1. CREATE TABLE `link_out104` (
  2.   `domain_id` int(10) unsigned NOT NULL,
  3.   `link_id` int(10) unsigned NOT NULL auto_increment,
  4.   `url_from` varchar(255) NOT NULL,
  5.   `url_to` varchar(255) NOT NULL,
  6.   `anchor` varchar(255) NOT NULL,
  7.   `from_site_id` int(10) unsigned NOT NULL,
  8.   `from_forum_id` int(10) unsigned NOT NULL,
  9.   `from_author_id` int(10) unsigned NOT NULL,
  10.   `from_message_id` bigint(20) unsigned NOT NULL,
  11.   `message_published` timestamp NOT NULL default CURRENT_TIMESTAMP,
  12.   `kind` enum(‘link’,‘img’) NOT NULL,
  13.   `url_title` varchar(255) NOT NULL,
  14.   `isexternal` tinyint(3) unsigned NOT NULL,
  15.   `revert_domain` varchar(255) NOT NULL,
  16.   `url_prefix` varchar(255) NOT NULL,
  17.   `from_domain_id` int(10) unsigned NOT NULL,
  18.   `ext` varchar(25) NOT NULL,
  19.   `linktype` enum(‘html’,‘video’,‘mp3′,‘image’,‘pdf’,‘other’) NOT NULL,
  20.   `message_day` date NOT NULL,
  21.   `mod_is` tinyint(3) unsigned NOT NULL default ’0′,
  22.   `is_adult` tinyint(3) unsigned NOT NULL default ’0′,
  23.   PRIMARY KEY  (`link_id`),
  24.   UNIQUE KEY `domain_id_2` (`domain_id`,`link_id`),
  25.   KEY `domain_id` (`domain_id`,`from_site_id`,`message_published`),
  26.   KEY `revert_domain` (`revert_domain`,`url_prefix`(80)),
  27.   KEY `from_site_id` (`from_site_id`,`message_published`),
  28.   KEY `site_message` (`from_site_id`,`message_day`,`isexternal`),
  29.   KEY `from_message_id` (`from_message_id`,`link_id`)
  30. ) ENGINE=InnoDB AUTO_INCREMENT=26141165 DEFAULT CHARSET=utf8;

And size of file is about 12.88 GB

-rw-r--r-- 1 root root 13832814592 Sep 10 14:41 link_out104.ibd

So to get statistics we run:

xtrabackup --stats --tables=art.link* --datadir=/mnt/data/mysql/

which will show something like this:

CODE:

  1. <INDEX STATISTICS>
  2.  
  3.   table: art/link_out104, index: PRIMARY, space id: 12, root page 3
  4.   estimated statistics in dictionary:
  5.     key vals: 25265338, leaf pages 497839, size pages 498304
  6.   real statistics:
  7.      level 2 pages: pages=1, data=5395 bytes, data/pages=32%
  8.      level 1 pages: pages=415, data=6471907 bytes, data/pages=95%
  9.         leaf pages: recs=25958413, pages=497839, data=7492026403 bytes, data/pages=91%
  10.  
  11.   table: art/link_out104, index: domain_id_2, space id: 12, root page 4
  12.   estimated statistics in dictionary:
  13.     key vals: 27755790, leaf pages 23125, size pages 26495
  14.   real statistics:
  15.      level 2 pages: pages=1, data=510 bytes, data/pages=3%
  16.      level 1 pages: pages=30, data=393125 bytes, data/pages=79%
  17.         leaf pages: recs=25958413, pages=23125, data=337459369 bytes, data/pages=89%
  18.  
  19.   table: art/link_out104, index: domain_id, space id: 12, root page 5
  20.   estimated statistics in dictionary:
  21.     key vals: 3006231, leaf pages 43255, size pages 49600
  22.   real statistics:
  23.      level 2 pages: pages=1, data=2850 bytes, data/pages=17%
  24.      level 1 pages: pages=114, data=1081375 bytes, data/pages=57%
  25.         leaf pages: recs=25953873, pages=43255, data=545031333 bytes, data/pages=76%
  26.  
  27.   table: art/link_out104, index: revert_domain, space id: 12, root page 6
  28.   estimated statistics in dictionary:
  29.     key vals: 1204830, leaf pages 133869, size pages 153984
  30.   real statistics:
  31.      level 3 pages: pages=1, data=373 bytes, data/pages=2%
  32.      level 2 pages: pages=6, data=58143 bytes, data/pages=59%
  33.      level 1 pages: pages=832, data=9146283 bytes, data/pages=67%
  34.         leaf pages: recs=25839414, pages=133869, data=1566961607 bytes, data/pages=71%
  35.  
  36.   table: art/link_out104, index: from_site_id, space id: 12, root page 7
  37.   estimated statistics in dictionary:
  38.     key vals: 330426, leaf pages 33889, size pages 38848
  39.   real statistics:
  40.      level 2 pages: pages=1, data=1764 bytes, data/pages=10%
  41.      level 1 pages: pages=84, data=711669 bytes, data/pages=51%
  42.         leaf pages: recs=25956416, pages=33889, data=441259072 bytes, data/pages=79%
  43.  
  44.   table: art/link_out104, index: site_message, space id: 12, root page 8
  45.   estimated statistics in dictionary:
  46.     key vals: 1399286, leaf pages 32260, size pages 36992
  47.   real statistics:
  48.      level 2 pages: pages=1, data=1680 bytes, data/pages=10%
  49.      level 1 pages: pages=80, data=677460 bytes, data/pages=51%
  50.         leaf pages: recs=25956043, pages=32260, data=441252731 bytes, data/pages=83%
  51.  
  52.   table: art/link_out104, index: from_message_id, space id: 12, root page 9
  53.   estimated statistics in dictionary:
  54.     key vals: 25964521, leaf pages 27979, size pages 28160
  55.   real statistics:
  56.      level 2 pages: pages=1, data=798 bytes, data/pages=4%
  57.      level 1 pages: pages=38, data=587559 bytes, data/pages=94%
  58.         leaf pages: recs=25958413, pages=27979, data=441293021 bytes, data/pages=96%

The output is intensive, let me highlight some points:

CODE:

  1. table: art/link_out104, index: PRIMARY, space id: 12, root page 3
  2.         leaf pages: recs=25958413, pages=497839, data=7492026403 bytes, data/pages=91%

It says that PRIMARY key (which is the table by itself, as InnoDB is clustering data by primary key) takes 497839 pages ( 16KB each) and size of data 7492026403 bytes or (6.98 GB). And density ( fitting data into pages) is quite good – 91%. But it was expected, as table is really mostly inserted in, updates and deletes are rare).

And let’s take index domain_id

CODE:

  1. table: art/link_out104, index: domain_id, space id: 12, root page 5
  2.         leaf pages: recs=25953873, pages=43255, data=545031333 bytes, data/pages=76%

you can see the allocated pages (43255 pages or 708689920 bytes) are filled only by 76% ( data takes 545031333 bytes). And that means that 150MB are just waste of space. Which is really even worse for key revert_domain

leaf pages: recs=25839414, pages=133869, data=1566961607 bytes, data/pages=71%
.

For this key about 600MB is empty.

This needs a bit of explaining:
This does not have as good efficiency as the primary key, but a lot of this is to be expected. In a lot of cases we insert into the primary key in order which makes things very predictable, but the inserts into the secondary key index are random – which leads to a lot of page splits.

One helpful new feature to address this is in XtraDB/InnoDB plugin – fast index creation. With this feature, InnoDB creates indexes by sort, so page fill factor should be quite good.

To check that, there is xtrabackup –stats for index domain_id created for table in Barracuda format with Fast creation method:

CODE:

  1. table: art/link_out104, index: domain_id, space id: 15, root page 49160
  2.   estimated statistics in dictionary:
  3.     key vals: 5750565, leaf pages 34383, size pages 34496
  4.   real statistics:
  5.      level 2 pages: pages=1, data=1375 bytes, data/pages=8%
  6.      level 1 pages: pages=55, data=859575 bytes, data/pages=95%
  7.         leaf pages: recs=25958413, pages=34383, data=545126673 bytes, data/pages=96%

As you see this time it takes 34383 pages (compare to 43255 in previous statistics).

Though it would be interesting to see how it will grow with further inserts, and I also suspect random INSERTS into so dense space going to be slower than in previous case.

The –stats is not in xtrabackup release yet, only in source code repository, but should be released quite soon.

And the last point of the post – if you are badly missing some features in MySQL, InnoDB, InnoDB-plugin, XtraDB, XtraBackup – you know whom ask for!


Entry posted by Vadim |
5 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Jul
12
2009
0

PHP urlEncode urlDecode Tool

So while I was at it made this PHP urlEncode urlDecode Tool because I keep forgetting to bookmark other sites’ urlencode/urldecoders.  Also because the Mac version of Hackbar in Firefox doesn’t have the same options as the Window’s version.

Jul
12
2009
0

PHP TimeDiff Tool

I needed to send an email with human readable string of the difference between two timestamps, so I made this little tool:

PHP TimeDiff Tool

Jun
20
2009
0

Update PHP Tools page

I updated my tools page.  The php timestamp converter tool has been updated to use timezone_identifiers_list() (i.e. ‘America/New_York’), now instead of hard-coded offsets.

Also moved all the scripts into one directory here.

When I get bored again I’ll add some more.  I actually use these from work so it’s really just a convenience for me.

Written by in: HTML,PHP,Tools,Web Development | Tags: , ,
Apr
05
2009
0

The best symfony IDE for Mac: Probably still Zend Studio

So in a recent symfony-zone article about “the best Symfony IDE: PHPEdit“, the author recommended using PHPEdit.   Problem is PHPEdit doesn’t have a Mac OS X version according to their requirements page as of this date.

So for now using Zend Studio or BBEdit is probably your best bet.  Haven’t tried Netbeans, but at least one developer I know still uses it.  Of course you could just use vi if all else fails.

One last thing, Zend Studio probably won’t ever have any first party support for symfony being that Zend Studio’s parent company has their own framework; so I would’t hold my breath for that.

Mar
31
2009
0

Conficker worm removal tool – from Microsoft!!!

Don’t download from these other sites tools that claim to remove probably the most prevalent virus in computer history.  Go to this site on Microsoft’s website:

http://www.microsoft.com/protect/computer/viruses/worms/conficker.mspx

Mar
29
2009
0

10+ Best Firefox Addons for Security and Privacy

Any web developer today is probably running Firefox because of some of the great add-ons they have (i.e. WebDeveloper, Colorzilla, Firebug, Hackbar, etc…). But you may also want to install some addons for security:

Security and privacy are some of the major concerns these days while choosing a web browser to use. So much so that all the major players in the “browser wars” are providing or developing a private browsing mode.

Firefox with the myraid of add-ons that it has to offer is never far from action. Here are some of the top Firefox addons that you should install for better privacy and security

Check out this article for the list of 10+ Best Firefox Addons for Security and Privacy.

Mar
28
2009
0

4 Essential Web Developer Addons for Firefox

I would argue the Firefox (currently #2), is probably the easiest of the browsers to develop for because of the great add-ons they have available. If you’re not already using any of the below, download them now:

  1. Web Developer: Next to Firebug, this is a very useful tool for manipulating and inspecting web pages.  It gives you tools for Javascript, Cookies, CSS, Forms, Images, Validations, and other useful utilities (I use the on-screen Ruler alot as well to get dimensions).
  2. Firebug: If you find yourself having to debug Javascript or integrate some of the popular frameworks, you’ll need this tool.  It has some overlap with Web Developer, but when you really need to inspect the DOM and debug Javascript, I can’t think of a better add-on.
  3. Colorzilla: Ever wish you had an eyedropper inside Firefox?  Well here’s your tool.  A small add-on that allows you get the colors from a page quickly without having to open up your graphics program or search the style sheet for the color.
  4. HackBar: When manipulating a URI is too cumbersome, you may want to invest in this little add-on.  Take a URI you’re on and split each parameter into a separate line in a small text editor in your toolbar.  Then you’ll have quick access to encode, encrypt, increment or decrement parameters easily for POST or GET calls.
Mar
21
2009
0

Downloading older versions of software

So the adage, “Sometimes newer is not always better,” is true when it comes to software. I find myself getting used to older software and don’t necessarily need all the newer features.

Sometimes I buy a version or two older than the newest to save some money, but that’s another article.

Now obviously Windows has a built-in extraction tool, but I will harken back to WinZip days…that’s why I go here to download the older version of Winzip – Version 7 is my favorite.

You can goto Oldversion.com and download a whole-host of older applications.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com