InnoDB Full-text Search in MySQL 5.6: Part 3, Performance

This is part 3 of a 3 part series covering the new InnoDB full-text search features in MySQL 5.6. To catch up on the previous parts, see part 1 or part 2

Some of you may recall a few months ago that I promised a third part in my InnoDB full-text search (FTS) series, in which I’d actually take a look at the performance of InnoDB FTS in MySQL 5.6 versus traditional MyISAM FTS. I hadn’t planned on quite such a gap between part 2 and part 3, but as they say, better late than never. Recall that we have been working with two data sets, one which I call SEO (8000-keyword-stuffed web pages) and the other which I call DIR (800K directory records), and we are comparing MyISAM FTS in MySQL 5.5.30 versus InnoDB FTS in MySQL 5.6.10.

For reference, although this is not really what I would call a benchmark run, the platform I’m using here is a Core i7-2600 3.4GHz, 32GiB of RAM, and 2 Samsung 256GB 830 SSDs in RAID-0. The OS is CentOS 6.4, and the filesystem is XFS with dm-crypt/LUKS. All MySQL settings are their respective defaults, except for innodb_ft_min_token_size, which is set to 4 (instead of the default of 3) to match MyISAM’s default ft_min_word_len.

Also, recall that the table definition for the DIR data set is:

CREATE TABLE dir_test (
  full_name VARCHAR(100),
  details TEXT

The table definition for the SEO data set is:

CREATE TABLE seo_test (
 title VARCHAR(255),

Table Load / Index Creation

First, let’s try loading data and creating our FT indexes in one pass – i.e., we’ll create the FT indexes as part of the original table definition itself. In particular, this means adding “FULLTEXT KEY (full_name, details)” to our DIR tables and adding “FULLTEXT KEY (title, body)” to the SEO tables. We’ll then drop these tables, drop our file cache, restart MySQL, and try the same process in two passes: first we’ll load the table, and then we’ll do an ALTER to add the FT indexes. All times in seconds.

Engine Data Set one-pass (load) two-pass (load, alter)
MyISAM SEO 3.91 3.96 (0.76, 3.20)
InnoDB SEO 3.777 7.32 (1.53, 5.79)
MyISAM DIR 43.159 44.93 (6.99, 37.94)
InnoDB DIR 330.76 56.99 (12.70, 44.29)

Interesting. For MyISAM, we might say that it really doesn’t make too much difference which way you proceed, as the numbers from the one-pass load and the two-pass load are within a few percent of each other, but for InnoDB, we have mixed behavior. With the smaller SEO data set, it makes more sense to do it in a one-pass process, but with the larger DIR data set, the two-pass load is much faster.

Recall that when adding the first FT index to an InnoDB table, the table itself has to be rebuilt to add the FTS_DOC_ID column, so I suspect that the size of the table when it gets rebuilt has a lot to do with the performance difference on the smaller data set. The SEO data set fits completely into the buffer pool, the DIR data set does not. That also suggests that it’s worth comparing the time required to add a second FT index (this time we will just index each table’s TEXT/MEDIUMTEXT field). While we’re at it, let’s look at the time required to drop the second FT index as well. Again, all times in seconds.

Engine Data Set FT Index Create Time FT Index Drop Time
MyISAM SEO 6.34 3.17
InnoDB SEO 3.26 0.01
MyISAM DIR 74.96 37.82
InnoDB DIR 24.59 0.01

InnoDB wins this second test all around. I’d attribute InnoDB’s win here partially to not having to rebuild the whole table with second (and subsequent) indexes, but also to the fact that at least some the InnoDB data was already in the buffer pool from when the first FT index was created. Also, we know that InnoDB generally drops indexes extremely quickly, whereas MyISAM requires a rebuild of the .MYI file, so InnoDB’s win on the drop test isn’t surprising.

Query Performance

Recall the queries that were used in the previous post from this series:

1. SELECT id, title, MATCH(title, body) AGAINST ('arizona business records'
   IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3
2. SELECT id, title, MATCH(title, body) AGAINST ('corporation commission forms'
   IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3 DESC
   LIMIT 5;
3. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson +arizona'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;
4. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson arizona'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;
5. SELECT id, full_name, MATCH(full_name, details) AGAINST ('"Thomas B Smith"'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 1;

The queries were run consecutively from top to bottom, a total of 10 times each. Here are the results in tabular format:

Query # Engine Min. Execution Time Avg. Execution Time Max. Execution Time
1 MyISAM 0.007953 0.008102 0.008409
1 InnoDB 0.014986 0.015331 0.016243
2 MyISAM 0.001815 0.001893 0.001998
2 InnoDB 0.001987 0.002077 0.002156
3 MyISAM 0.000748 0.000817 0.000871
3 InnoDB 0.670110 0.676540 0.684837
4 MyISAM 0.001199 0.001283 0.001372
4 InnoDB 0.055479 0.056256 0.060985
5 MyISAM 0.008471 0.008597 0.008817
5 InnoDB 0.624305 0.630959 0.641415

Not a lot of variance in execution times for a given query, so that’s good, but InnoDB is always coming back slower than MyISAM. In general, I’m not that surprised that MyISAM tends to be faster; this is a simple single-threaded, read-only test, so none of the areas where InnoDB shines (e.g., concurrent read/write access) are being exercised here, but I am quite surprised by queries #3 and #5, where InnoDB is just getting smoked.

I ran both versions of query 5 with profiling enabled, and for the most part, the time spent in each query state was identical between the InnoDB and MyISAM versions of the query, with one exception.

InnoDB: | Creating sort index | 0.626529 |
MyISAM: | Creating sort index | 0.014588 |

That’s where the bulk of the execution time is. According to the docs, this thread state means that the thread is processing a SELECT which required an internal temporary table. Ok, sure, that makes sense, but it doesn’t really explain why InnoDB is taking so much longer, and here’s where things get a bit interesting. If you recall part 2 in this series, query 5 actually returned 0 results when run against InnoDB with the default configuration because of the middle initial “B”, and I had to set innodb_ft_min_token_size to 1 in order to get results back. For the sake of completeness, I did that again here, then restarted the server and recreated my FT index. The results? Execution time dropped by 50% and ‘Creating sort index’ didn’t even appear in the query profile:

mysql [localhost] {msandbox} (test): SELECT id, full_name, MATCH(full_name, details) AGAINST
('"Thomas B Smith"' IN BOOLEAN MODE) AS score FROM dir_test_innodb ORDER BY 3 DESC LIMIT 1;
| id    | full_name      | score             |
| 62633 | Thomas B Smith | 32.89915466308594 |
1 row in set (0.31 sec)
mysql [localhost] {msandbox} (test): show profile;
| Status                  | Duration |
| starting                | 0.000090 |
| checking permissions    | 0.000007 |
| Opening tables          | 0.000017 |
| init                    | 0.000034 |
| System lock             | 0.000012 |
| optimizing              | 0.000008 |
| statistics              | 0.000027 |
| preparing               | 0.000012 |
| FULLTEXT initialization | 0.304933 |
| executing               | 0.000008 |
| Sending data            | 0.000684 |
| end                     | 0.000006 |
| query end               | 0.000006 |
| closing tables          | 0.000011 |
| freeing items           | 0.000019 |
| cleaning up             | 0.000003 |

Hm. It’s still slower than MyISAM by quite a bit, but much faster than before. The reason it’s faster is because it found an exact match and I only asked for one row, but if I change LIMIT 1 to LIMIT 2 (or limit N>1), then ‘Creating sort index’ returns to the tune of roughly 0.5 to 0.6 seconds, and ‘FULLTEXT initialization’ remains at 0.3 seconds. So this answers another lingering question: there is a significant performance impact to using a lower innodb_ft_min_token_size (ifmts), and it can work for you or against you, depending upon your queries and how many rows you’re searching for. The time spent in “Creating sort index” doesn’t vary too much (maybe 0.05s) between ifmts=1 and ifmts=4, but the time spent in FULLTEXT initialization with ifmts=4 was typically only a few milliseconds, as opposed to the 300ms seen here.

Finally, I tried experimenting with different buffer pool sizes, temporary table sizes, per-thread buffer sizes, and I also tried changing from Antelope (ROW_FORMAT=COMPACT) to Barracuda (ROW_FORMAT=DYNAMIC) and switching character sets from utf8 to latin1, but none of these made any difference. The only thing which seemed to provide a bit of a performance improvement was upgrading to 5.6.12. The execution times for the InnoDB FTS queries under 5.6.12 were about 5-10 percent faster than with 5.6.10, and query #2 actually performed a bit better under InnoDB than MyISAM (average execution time 0.00075 seconds faster), but other than that, MyISAM still wins on raw SELECT performance.

Three blog posts later, then, what’s my overall take on InnoDB FTS in MySQL 5.6? I don’t think it’s great, but it’s serviceable. The performance for BOOLEAN MODE queries definitely leaves something to be desired, but I think InnoDB FTS fills a need for those people who want the features and capabilities of InnoDB but can’t modify their existing applications or who just don’t have enough FTS traffic to justify building out a Sphinx/Solr/Lucene-based solution.

The post InnoDB Full-text Search in MySQL 5.6: Part 3, Performance appeared first on MySQL Performance Blog.


Let’s talk about Percona Server 5.6: Enterprise Grade MySQL (webinar)

Percona Server 5.6: Enterprise Grade MySQL

The new Percona Server 5.6 is the most manageable, highest performance, and most scalable version of MySQL available. Percona Server 5.6 is the best open source MySQL choice for enterprise-grade applications because it combines new features with the best features of Percona Server 5.5 and MySQL 5.6 to provide unparalleled performance.

Join me tomorrow as I explain how Percona Server 5.6 takes MySQL performance to new heights. In this webinar, aptly titled “Percona Server 5.6: Enterprise Grade MySQL,” I’ll compare Percona Server 5.6 to Percona Server 5.5 and MySQL 5.6, highlighting key differences between the two versions. I’ll also compare Percona Server 5.6 to MySQL 5.6, including a review of top deprecated features and their optimized replacements, enhancements to performance monitoring, slow query log, XtraDB, and other groundbreaking features. Additionally, I’ll also show you exceptionally faster backups aided by bitmap-based incremental backups when used in combination with Percona XtraBackup

At the end of this webinar you’ll understand the differences between Percona Server 5.6, Percona Server 5.5 and MySQL 5.6 and how those differences make Percona Server 5.6 the most manageable, highest performance, and most scalable version of MySQL available today. This will be an interactive presentation to please feel free to ask questions both here and during the webinar.

WHEN: Wednesday, July 31, 2013  1 p.m. Pacific Standard Time. Register here.

The post Let’s talk about Percona Server 5.6: Enterprise Grade MySQL (webinar) appeared first on MySQL Performance Blog.


Your troubles are over, pt-heartbeat 2.1

Poor pt-heartbeat 2.1: it was perfectly fine through Percona Toolkit 2.1.7, but it’s had a rough life since. Its troubles are finally over as of Percona Toolkit 2.1.10. At a conference recently, a user joked to me that “Percona Toolkit is awesome, but you really broke pt-heartbeat.” It’s true, and if you haven’t heard or experienced the story, here it is for the record.

Since the dawn of time, pt-heartbeat had computed slave lag like:

t1=time (Perl) --> replicate --> t2=time (Perl); lag = t2 - t1

Surprisingly, that worked for many years despite what should have been an obvious problem: different time zones: t1 could be in PST and t2 in EST and even if replication took effectively zero seconds, lag would be reported because 00:00 PST (t1) is 03:00 EST (t2).

This problem was not reported until bug 886059 in PT 2.1.8. The solution jumped out at us: use only UTC timestamps. We did, and this is when the troubles of pt-heartbeat 2.1 began.

The change in 2.1.8 broke pt-heartbeat in two ways. First, only using UTC timestamps, made 2.1.8 backwards-incompatible with 2.1.7 and earlier versions. We’re usually very attentive to maintaining backwards-compatibility within minor versions, but our failing in this case was the mistaken assumption that everyone running 2.1 would upgrade all their servers at once. This is not true: people run mixes of minor versions, relying on developers to maintain backwards-compatibility within the minor version. So those running 2.1.8 with 2.1.7 or earlier versions of pt-heartbeat ran into the very problem we were trying to fix: different time zones, because 2.1.8 used only UTC whereas 2.1.7 and earlier versions did not.

The second break was switching from Perl time to MySQL time which caused a loss of precision. Most MySQL servers have only 1-second timestamps, like 2013-06-28 00:00:01, but Perl time (when using the Time::HiRes module) gives us microsecond timestamps like 2013-06-28 00:00:01.123456. Microsecond timestamps are required to achieve what the tool’s docs state:

pt-heartbeat has a maximum resolution of 0.01 second.

So those running 2.1.8 ran into a complete loss of high-precision slave lag readings; this made the tool pretty much useless because MySQL already has 1-second slave lag resolution: seconds_behind_master (even though the way it computes that value is different than how pt-heartbeat computes slave lag).

We fixed those two bugs in PT 2.1.9, bug 1099665 and bug 1103221 respectively, by adding the –utc option. Now pt-heartbeat 2.1.9 was back to its original 2.1.7 self: microsecond timestamps from Perl were used again, and users could specify –utc to use UTC timestamps. Unfortunately, the troubles of pt-heartbeat 2.1 did not end there.

Certain MySQL functions are “time zone aware”, like UNIX_TIMESTAMP() for which the docs state:

The server interprets date as a value in the current time zone and converts it to an internal value in UTC.

So although we wrote UTC timestamps, we read them back in whatever the system time zone was. The fix: SET time_zone=’+0:00′ to truly make everything UTC. This was bug 1163372: “pt-heartbeat –utc –check always returns 0″. Given this change and those before it, bug 1137556, “pt-heartbeat docs don’t account for –utc”, was perhaps inevitable. Not only had we been breaking and fixing pt-heartbeat’s handling of time, but reading the docs would have given the wrong information, too.

Finally, those bugs have been fixed in Percona Toolkit 2.1.10 and I think it’s ok to say now: pt-heartbeat 2.1, your troubles are over. 2.1.10 will probably be the last release in the 2.1 series; users should upgrade to 2.2. pt-heartbeat 2.2.4 and later have all the fixes of pt-heartbeat 2.1.10.

Sorry pt-heartbeat 2.1, and sorry users of pt-heartbeat 2.1. We’ve learned from this tragic tale, and we hope never to repeat it.

The post Your troubles are over, pt-heartbeat 2.1 appeared first on MySQL Performance Blog.


Dictating your Novel

MicI’ve been experimenting with Dragon Dictate software. This is usually recognized to be the best commercially available speech recognition system and you can buy it for Windows or Mac for under $100. I wanted to see how difficult it would be to dictate my novel. Wouldn’t it be great to lay on the couch with a headset?

I was extremely impressed with the quality right out of the box. The software took me through a handful of short known sentences so it can tune to your voice. I could keep doing this for as long as I wanted and even go back later and train it in phrases that it repeatedly got wrong, but for the sake of laziness I decided to do the minimum amount of training. It did remarkably well with my English accent and the fact that I’m not a clear speaker.

I spoke at a normal conversational speed, just as if it were another person. It’s very natural. Typically nothing appeared on the screen until I said comma or period, or some other punctuation. The recognition engine likes to obtain the full context of a sentence or clause before translating, A fraction of a second later, the entire text appeared. I didn’t have to pause – I just kept going.

Let’s look at a couple of samples. I actually recorded myself speaking these snippets from my upcoming book, but chose not to include them since I sound horrible when recorded. Ugh!  Maybe I need elocution lessons – or a better mic

Here’s the text I read:

Moving faster than a Djinn out of a bottle, one of the creatures leapt up onto the roof of a dormer window that overhung the street. Worn tiles slid and crashed to the ground. It sprang again, pushed off the wall and landed beside the man. Talon-like fingernails flashed in the lantern light, and the wight raked the man’s forearm, shredding it.

And here’s how it emerged from Dragon Dictate:

Moving faster than a gene out of a bottle, one of the creatures leapt up onto the roof of a dorm window that overhung the street. One tile Slate and crashed to the ground. It sprang again, pushed off the wall and landed beside the man. Talent like fingernails flashed in the lantern light, and the white rate to the man’s forearm, shredding it.

Not bad! You can see exactly why it went wrong, largely because of a lack of knowledge of a creature called a wight and the pronunciation of a couple of words.


Here’s another sample:

“I want to be a necromancer.” Her eyes locked on mine.

“Right. Do you even know what one is?”

She rolled her eyes. “Everyone knows what you do, though I bet only half of the stories are true.”

“It’s dirty and dangerous and not at all becoming for a girl.”


And how it came out:

 ”I want to be a necromancer.” Her eyes locked on mine.

“Right. Do you even know what one is?”

She rolled her eyes. “Everyone knows what to do, though I bet only half of the stories are true.”

“It’s dirty and dangerous and not at all coming for a girl.”

Almost completely perfect.


That second piece was trickier because I had to say open quote and close quote, and this is one of the things that made it awkward to use. After hours of practice,  I remembered most of the time (and went back and added the missing ones later), but it definitely broke my concentration. I had to say new line for paragraph breaks too. I could edit by telling it to select a word/phrase and then to replace or insert, but since I had to proofread it anyway, I found it easier to make fixes using the keyboard. It was fun dictating a page or so and then going back to clean it up, but it definitely took discipline.

This leads to my final point. Apart from the overhead of the extra words (which I think could be overcome after days or weeks of practice), I just couldn’t think verbally. Neural pathways have been strengthened between the creative parts of my brain and my fingers, and that’s how I have trained my body to write. It just wasn’t natural to dictate. I had expected it to be like having a conversation, but I suspect that during the act of transcribing our creativity, our eyes are subtly scanning the paragraph and lines we have written to retain context – sort of keep our mental place. Dictating took more conscious effort (perhaps because it’s unnatural) and I regularly lost my place or forgot what I had just said. I suspect this would be even worse if I had attempted to dictate into my iPhone away from my computer, without the visual cue of the screen.

So much for my dream of dictating my novel on my drive to work.

I have however found the perfect use for it. For me, it works great when I want to describe setting and mood. I put on my headset, close my eyes and just say what I picture in my mind. It works great for description like that. Dialog, not so much.



Checking B+tree leaf nodes list consistency in InnoDB

If we have InnoDB pages there are two ways to learn how many records they contain:

  • PAGE_N_RECS field in the page header
  • Count records while walking over the list of records from infimum to supremum

In some previous revision of the recovery tool a short summary was added to a dump which is produced by the constraints_parser.

But if a page is lost and page_parser hasn’t found it, all records from this page are lost. In other words per-page recovery statistics gives us little idea about whether or not a recovered table is complete.

To cover this flaw a new tool index_check is introduced in the revision 80

As you might know InnoDB stores a table in a clustered index called PRIMARY.

The PRIMARY index is a B+tree structure. It has a nice feature that all leaf nodes have pointers to a next leaf node, so following the pointers we can read whole index in the primary key order. InnoDB extends the structure and stores also a pointer to the previous node. The previous node of the head is NULL as well as the next node of the tail.

Based on this knowledge we can check if we have all elements of the list. If we can get from any InnoDB page to the beginning and the end of the list then our set of pages is complete.

This is exactly what index_chk does – it reads files from a directory with InnoDB pages (that is produced by page_parser) and tries to walk over the list of pages back and forth.

Let’s review an example. I took some corrupt InnoDB tablespace and split it with a page_parser:

# ./page_parser -f /var/lib/mysql/ibdata1
Opening file: /var/lib/mysql/ibdata1
File information:
ID of device containing file:        64512
inode number:                     27037954
protection:                         100660 (regular file)
number of hard links:                   1
user ID of owner:                      107
group ID of owner:                     116
device ID (if special file):             0
blocksize for filesystem I/O:         4096
number of blocks allocated:          86016
time of last access:            1374662334 Wed Jul 24 06:38:54 2013
time of last modification:      1374233938 Fri Jul 19 07:38:58 2013
time of last status change:     1374233938 Fri Jul 19 07:38:58 2013
total size, in bytes:             44040192 (42.000 MiB)
Size to process:                  44040192 (42.000 MiB)
8.26% done. 2013-07-24 08:39:58 ETA(in 00:00 hours). Processing speed: 3637248 B/sec
95.70% done. 2013-07-24 08:40:09 ETA(in 00:00 hours). Processing speed: 4399053 B/sec

Now let’s take some index and check if it has all pages:

# ls pages-1374669586/FIL_PAGE_INDEX/0-410/
00000000-00000145.page  00000000-00000235.page  00000000-00000241.page  00000000-00000247.page  00000000-00000254.page
00000000-00000147.page  00000000-00000236.page  00000000-00000243.page  00000000-00000249.page  00000000-00000255.page
00000000-00000148.page  00000000-00000239.page  00000000-00000244.page  00000000-00000251.page
# ./index_chk -f pages-1374669586/FIL_PAGE_INDEX/0-410
Couldn't open file pages-1374669586/FIL_PAGE_INDEX/0-410/00000000-00000140.page

Bad news, a page with id 140 is missing!

Indeed the previous page before page#145 is page#140, but it’s missing.

# hexdump -C pages-1374669586/FIL_PAGE_INDEX/0-410/00000000-00000145.page | head -2
00000000  d4 cd 68 41 00 00 00 91  00 00 00 8c ff ff ff ff  |..hA............|
00000010  00 00 00 00 2b 6c ea 90  45 bf 00 00 00 00 00 00  |....+l..E.......|

For a cheerful ending let’s check an index that has all pages:

# ls pages-1374669586/FIL_PAGE_INDEX/0-2/* # this is SYS_COLUMNS table
pages-1374669586/FIL_PAGE_INDEX/0-2/00000000-00000010.page  pages-1374669586/FIL_PAGE_INDEX/0-2/00000000-00002253.page
# ./index_chk -f pages-1374669586/FIL_PAGE_INDEX/0-2

Thus, a table is fully recovered if and only if two conditions are met:

  • Command grep “Lost records: YES” table.dump | grep -v “Leaf page: NO” outputs nothing
  • ./index_chk -f pages-1374669586/FIL_PAGE_INDEX/<inde_id of=”” my=”” table=””> reports OK

The post Checking B+tree leaf nodes list consistency in InnoDB appeared first on MySQL Performance Blog.


Talking Drupal #007 – Distributions


– What is a Drupal Distribution?

– Who would use a Distribution?

– Examples of Distributions

– What can be done with a Distribution?

– Problems with using a Distribution as a starting point

– What makes up a Distribution?

– Sharing a Distribution on Drupal.org

– Using a Distribution for your own development

Module of the Week 

– Fast Permissions Administration


– Drupal Distribution Page – https://drupal.org/project/distributions

– Open Public – https://drupal.org/project/openpublic

– Commerce Kickstart – https://drupal.org/project/commerce_kickstart

– Drupal Licensing – https://drupal.org/licensing/faq&


– Stephen Cross – www.ParallaxInfoTech.com @stephencross

– Jason Pamental – www.hwdesignco.com @jpamental

– John Picozzi – www.RubicDesign.com @johnpicozzi 

– Nic Laflin – www.nLightened.net @nicxvan


Talking Drupal #007 – Distributions


– What is a Drupal Distribution?

– Who would use a Distribution?

– Examples of Distributions

– What can be done with a Distribution?

– Problems with using a Distribution as a starting point

– What makes up a Distribution?

– Sharing a Distribution on Drupal.org

– Using a Distribution for your own development

Module of the Week 

– Fast Permissions Administration


– Drupal Distribution Page – https://drupal.org/project/distributions

– Open Public – https://drupal.org/project/openpublic

– Commerce Kickstart – https://drupal.org/project/commerce_kickstart

– Drupal Licensing – https://drupal.org/licensing/faq&


– Stephen Cross – www.ParallaxInfoTech.com @stephencross

– Jason Pamental – www.hwdesignco.com @jpamental

– John Picozzi – www.RubicDesign.com @johnpicozzi 

– Nic Laflin – www.nLightened.net @nicxvan


About Optimizer Hints

Quite often I’m asked what I think about query hints. The answer is more lengthy and probably also more two-fold than most people expect it to be. However, to answer this question once and forever, I though I should write it down.

The most important fact about query hints is that not all query hints are born equally. I distinguish two major types:

Restricting Hints

Most query hints are restricting hints: they limit the optimizers’ freedom to choose an execution plan. “Hint” is an incredibly bad name for these things as they force the optimizer to do what it has been told—probably the reason MySQL uses the FORCE keyword for those.

I do not like restricting hints, yet I use them sometimes to test different execution plans. It usually goes like this: when I believe a different execution plan could (should?) give better performance, I just hint it to see if it really gives better performance. Quite often it becomes slower and sometimes I even realize that the execution plan I though of does not work at all—at least not with the database I’m working at that moment.

Typical examples for restricting query hints are hints that force the database to use or not use a particular index (e.g., INDEX and NO_INDEX in the Oracle database, USE INDEX and IGNORE INDEX in MySQL, or INDEX, FORCESEEK and the like in SQL Server).

So, what’s wrong with them? Well, the two main problems are that they (1) restrict the optimizer and that they (2) often need volatile object names as parameters (e.g., index names). Example: if you use a hint to use index ABC for a query, the hint becomes ineffective when somebody changes the name of the index to ABCD. Further, if you restrict the optimizer you can no longer expect it to adjust the execution plan if you add another index that servers the query better. Of course there are ways around these problems. The Oracle database, for example, offers "index description" hints to avoid both issues: instead of specifying the index name, it accepts a description of the ideal index (column list) and it selects the index that matches this definition best.

Nevertheless, I strongly recommend against using restricting query hints in production. Instead you should find out why the optimizer does “the wrong thing”? and fix the root cause. Restricting hints fix the symptom, not the cause. That being said, I know that there is sometimes no other reasonable choice.

Supporting Hints

The second major type of query hints are supporting hints: they support the optimizer by providing information it doesn’t have otherwise. Supporting hints are rare—I’m only aware of a few good examples and the most useful one has already become obsolete: it’s FAST number_rows (SQL Server) and FIRST_ROWS(n) (Oracle). They tell the optimizer that the application plans to fetch only that many rows of the result. Consequently, the optimizer can prefer using indexes and nested loop joins that would be inefficient when fetching the full result (see Chapter 7, Partial Results for more details). Although being kind-of obsolete, I’m still using these hints as the defining example for supporting hints because they provide information the optimizer cannot have otherwise. This particular example is important enough that it was worth defining new keywords in the ISO SQL:2008: FETCH FIRST … ROWS ONLY and OFFSET. That’s why this hint is a very good, yet obsolete example for supporting query hints.

Another example for supporting hints is the (undocumented) CARDINALITY hint of the Oracle database. It basically overwrites the row count estimate of sub-queries. This hint was often used if the combined selectivity of two predicates was way off the product of the selectivity of each individual predicate (see Combined Selectivity Example). But this hint is also outdated since Oracle database 11g introduced extended statistics to cope with issues like that. SQL Server’s filtered statistics serve the same purpose. If your database cannot reflect data correlation in it’s statistics, you’ll need to fall back to restricting hints.

The Oracle hint OPT_ESTIMATE is somehow the successor of the CARDINALITY hint for cases when the estimations are still off. Pythian wrote a nice article about OPT_ESTIMATE.

Combined Selectivity Example

Let’s say we have two Y/N columns and each has a 50:50 distribution. When you select using both columns most optimizers estimate that the query matches 25% of the table (by multiplying two times 50%). That means that the optimizer assumes there is no correlation between those two columns.

Column 1 Column 2 count(*)
Y Y 25
Y N 25
N Y 25
N N 25

If there is a correlation, however, so that most rows that have Y in one column also have Y in the other column, then the estimate is way off.

Column 1 Column 2 count(*)
Y Y 49
Y N 1
N Y 1
N N 49

If you query one of the rare Y/N combinations, the optimizer might refrain from using an index due to the high cardinality estimate. Nevertheless, it would be better to use the index because this particular combination is very selective.

It think supporting hints are not that bad: they are just a way to cope with known limitations of the optimizer. That’s probably why they tend to become obsolete when the optimizers evolve.

And Then There Was PostgreSQL

You might have noticed that I did not mention PostgreSQL. It’s probably because PostgreSQL doesn’t have query hints although it has (which are actually session parameters). Confused? No problem, there is a short Wiki for that.

However, to see some discussion about introducing a similar hint as CARDINALITY described above or implementing "cross column statistics" read the first few messages in this thread from February 2011 (after the first page, the discussion moves to another direction). And the result? PostgreSQL still doesn’t have a good way to cope with the original problem of column correlation.

If you like my way to explain things, you’ll love SQL Performance Explained.

Original title and author: “About Optimizer Hints” by Markus Winand.


Percona Monitoring Plugins 1.0.4 release: Now with Galera graphs

Percona is glad to announce the release of Percona Monitoring Plugins 1.0.4. The components are designed to integrate seamlessly with widely deployed solutions such as Nagios and Cacti, and are delivered in the form of templates, plugins, and scripts.



* Added Galera/MySQL Monitoring Template for Cacti
* Added “Disk Read/Write Time per IO Request (ms)” graph
* Added “MySQL InnoDB Buffer Pool Efficiency” graph
* Switched ss_get_mysql_stats.php to PHP MySQLi extension and made it working with SSL (bug 1193097)
* Added user info to the max_duration check of pmp-check-mysql-innodb plugin (bug 1185513)
* ss_get_mysql_stats.php default values for ‘$status’ array were null instead of 0 (bug 1070268)
* Introduction of innodb_read_views_memory overrode the InnoDB total memory allocated output in Cacti (bug 1188519)
* ss_get_by_ssh.php parsed MongoDB counters incorrectly when replica is set (bug 1087073)
* Cacti graph “Redis Unsaved Changes” was empty for Redis 2.6 (bug 1110372)
* Comparison of status variables that are strings didn’t work with pmp-check-mysql-status (bug 1191305)
* pmp-check-mysql-processlist always showed 0 for “copy to table” counter (bug 1197084)
* percona-nagios-plugins package failed to install on Debian Squeeze when debsums is installed (bug 1194757)

A new tarball is available from downloads area or RPM and DEB packages from our software repositories. The plugins are fully supported for customers with a Percona Support contract and free installation services are provided as part of some contracts. In addition as part of Percona’s Remote DBA installation and setup of these tools are included with our services. You can find links to the documentation, forums and more at the project homepage.

The post Percona Monitoring Plugins 1.0.4 release: Now with Galera graphs appeared first on MySQL Performance Blog.


Growth of Percona software releases

It was once said that “real artists ship.” In looking over the history of Percona software releases, we are currently shipping more software than ever before.

First, let’s look at Percona Server. Let’s look at all major versions: 5.1, 5.5 and 5.6 as well as the total for each year. The estimate for 2013 comes from assuming the second half of 2013 is similar to the first half.

Percona Server releases per year

In 2011, when Percona Server 5.5 came along, we see a sharp reduction in Percona Server 5.1 releases (remember that there’s at least one Percona Server 5.1 release for each Oracle MySQL 5.1 release, and this reduction is likely a reflection of the reduction of releases from Oracle). In 2011, 2012 and 2013 we see a pretty steady number of Percona Server 5.1 releases. It seems that 5.1 is not going anywhere yet.

For Percona Server 5.5, we see an increase in 2012, which likely mirrors an increased interest in Percona Server 5.5 in production environments with added feature and bug fix releases.

For Percona XtraBackup, we have a similar story of an increase in the number of releases we’ve been making.

Percona XtraBackup releases per year

You can clearly see the switch between stable versions of 1.6.x to 2.0.x and now with 2.1.x. There has been a new major version of Percona XtraBackup each year since 2010 and for 1.6 and above, there have been maintenance releases into the following year.

It’s also clear that we’ve dramatically increased the number of Percona XtraBackup releases each year. This allows earlier access to new features (in alpha and beta releases) and earlier access to bug fixes (point releases such as 2.1.4).

If we look across the entire Percona software range, there is a quite noticeable increase in the number of releases year-on-year. This graph is again only up to July 23rd, so the 2013-estimate is based on average releases per month.

Percona Software releases per year

It’s quite amazing to see the increase from 2011, where we shipped 24 releases, to 2012 where we shipped 56 (more than one a week) to this year where we’ve already shipped 38 and we could ship 64.

This growth trend isn’t possible without both growing the development team and continually improving development practices, process and organization. Over the past two years we’ve made many changes to how we develop and release software. We’ve automated nearly everything (and this will be a topic for a future post) and distributed the workload of things that cannot be automated without major advancements in AI.

Personally, I’m really quite amazed when faced with the hard numbers. I’m also incredibly proud of everyone involved in making every release happen.

The post Growth of Percona software releases appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com