Webinar: Best Practices for MySQL Scalability on May 1

"MySQL 5.6: Advantages in a Nutshell." March 6 at 10 a.m. PST

“Best Practices for MySQL Scalability.”

If you have not already done so, I encourage you to register for my “Best Practices for MySQL Scalability” Webinar which will take place on May 1st at 10 a.m. PST. This will be an overview presentation, led by me and providing a high-level look at the components of MySQL scalability: application architecture, MySQL version and configuration, choosing hardware and operating systems. For each area we’ll investigate the most important best practices. Talk to you on Wednesday, and remember to prepare your questions in advance to get the most value out of the Webinar!


More info: MySQL scalability depends on getting many things right including the architecture, hardware, operating system, MySQL version, MySQL configuration, schema design and indexing, and query design. To avoid having any one of them become the bottleneck that limits the scalability of the entire system, you need to follow best practices in all of these technology areas. Each area deserves its own webinar so we won’t go into the trenches to explore every one in depth. Instead, we will provide an overview of MySQL scalability which highlights the most important considerations and best practices for each of these areas. Register now!

The post Webinar: Best Practices for MySQL Scalability on May 1 appeared first on MySQL Performance Blog.


Are self-publishing and ebooks destroying our bookstore heritage?

BarnesNobleThere has been considerable press for years about what effect self-publishing and ebooks are having on our literary culture, be it bookstores and libraries or even literary masterpieces themselves. I’m sure you’ve all heard the rhetoric that allowing “normal” folk to publish whatever they want with no agent or publisher to act as a gatekeeper for quality, will result in a deluge of crap flooding the markets. On the face if it, it’s a valid concern; no one wants to wade through junk to find a gem of entertainment, any form of entertainment.

Ebooks and self-publishing hit mainstream consciousness about 5 years ago, give or take. Ask yourself: Do you find yourself buying dross time and again and being unable to find anything decent to read? I would hazard to guess that this is NOT your situation, or indeed anyone’s. Survey after survey have shown that the single most important factor influencing whether you buy a book is word of mouth. Did your family and friends enthuse about it? Have you read dozens or hundreds of reviews recommending it? There are far too many books in the world for you to just pick one up on spec, with no reader feedback, and decide to try it. How often do you do that?


This, of course, makes it much harder for we, the new authors, to make our mark. We represent that last scenario: an author you’ve never heard of with little to recommend us. But even as an author struggling to reach a wider audience, I argue that this is ok. Really. Respect is earned. With every book I write, first a couple of hundred, then thousands of readers will try me out, like what they read and review my books favorably, and – hopefully – tell all their friends. It’s an apprenticeship system. Sure, some debut authors break out big, but for every one of them, there are likely 10,000 of us slowly building our reputation. It forms a self-filtering system to keep out all the crap that doomsayers are so concerned about. If I write crap then who will buy my next book? Who will recommend it? It’s literary Darwinism.

Over the last year or two, Scott Turow has dished out considerable damnation of self-publishing and ebooks, claiming they are destroying our literary heritage. Turow, for those unfamiliar, is the President of the Author’s Guild. I won’t even try to debunk his fanciful naysaying here, suffice to say that numerous writers more eloquent than myself have done so: Barry Eisler, David Gaughran, Joe Konrath, Forbes magazine. It does seem as if, far from representing all authors, large and small, as you’d expect the head of the Author’s Guild to do, he is simply trying to preserve the old way of life for the elite bestsellers paid substantial sums by publishing companies.

No, I’m not going to trash publishers, agents, nor traditionally published authors. I just happen to believe, like a growing number of authors, that self-publishing and ebooks simply ADD to our literary heritage. Having already debunked the “more-is-crap” myth above, then having more books, more authors, more ideas in more formats can only be a good thing for everyone. A very good thing. This is what freedoms are all about: seek an agent and a contract with a large publisher, or do it yourself. Is one better than the other? No. Is one easier than the other? No.

A couple of weeks ago, James Patterson, yes THE James Patterson, took out a full page ad on the rear of the NYT Review, in which he stated ,in no uncertain terms, that self-publishing and ebooks are destroying bookstores, libraries and the very concept of classic literature! Wow. The sky is falling! He went on to suggest that the government bail out the publishing industry like it did the auto industry. **Speechless**. Patterson is a damn fine author, but I just lost respect for him. Here’s his defense of that ad in Salon.

Didn’t we hear similar arguments when music moved to MP3′s and iTunes? Wasn’t that supposed to destroy our rich culture of music and fill it with crap, destroying the industry? And, yeah, it did. No wait, no it didn’t. Music today is vibrant with an eclectic mix of major bands, solo artists, garage bands and Indies, and the consumer is sucking up content and reveling in the choice. Sure, we no longer see gargantuan music stores like Tower and Virgin but does that really matter?  OK, one can probably prove that ebooks are leading to the closure of big-chain bookstores, and we can argue that bookstores are different than music stores, because there is that culture of wandering around, touching books, drinking coffee and reading in easy chairs. But there is absolutely no reason we can’t enjoy that today if companies would simply embrace change and adapt. What’s wrong with wandering into Barnes & Noble with your ereader, sinking into a chair, downloading samples of books to try, while sipping your latte? As for boutique, used bookstores, they will likely remain for decades to come, in the same way that vinyl music stores do. Some bands are even releasing new material on vinyl for the discerning audiophile. I’m sure there will always be a demand for gorgeously produced books, maybe first or special editions and omnibuses. Those little mom and pop bookstores are likely to survive longer than the big chains, and that’s a good thing. In my experience, the little stores are run by bibliophiles for the love of books, whereas chains are run for pure profit.

Amazon always seems to be the scapegoat for claims that ebooks are destroying paper books and bookstores. Ask yourself just why this “demon of literary destruction” called Amazon became so popular? Could it be that their prices were more reasonable, their selection greater, and the fact that most people probably don’t want to traipse down to their bookstore – they want to download their book right now, while they are reading in bed or in the garden? If people didn’t want Amazon’s services, it wouldn’t be the giant it is today, as simple as that. It is part of the human condition to lament “the good old days”, but that doesn’t always gel with our actual actions. Amazon moves with the times too – it embraced ebooks and then self-publishing, and, again, these things have become so rabidly popular among authors and readers because that is what people want.

I’d love to hear what you think? Are self-publishing and ebooks destroying our literary heritage? Is it all going to end in tears and a sea of dross, or a new era of choice and fresh talent?



More on MySQL transaction descriptors optimization

Since my first post on MySQL transaction descriptors optimization introduced in Percona Server 5.5.30-30.2 and a followup by Dimitri Kravchuk, we have received a large number of questions on why the benchmark results in both posts look rather different. We were curious as well, so we tried to answer that question by retrying benchmarks on various combinations of hardware and dataset sizes, including the ones that are as close as possible to Dimitri’s environment. To put a long story short, the results are fairly consistent with our original post across all test combinations. The details are below.

Here is a brief context recap. There are two cases covered in the first post:

  1. single SELECT queries doing PRIMARY KEY lookups (aka QPS sysbench mode);
  2. same MySQL queries executed inside single-statement transactions (TPS mode, see the original post on why this case is relevant).

Dimitri didn’t touch case #2 and focused only on case #1. Which is the perfect case for read-only transaction optimization in MySQL 5.6, because all SELECT queries in the AUTOCOMMIT mode are, by definition, read-only transactions, so the server deals with an empty transaction list when creating read views. However, the more general descriptors optimization in Percona Server showed fairly close results in our tests. In Dimitri’s tests Percona Server scalability is somewhere in between of MySQL 5.5 and MySQL 5.6.

In order to understand more about differences between results we have rerun tests on several our boxes. You can find setup/config details at the very end of this post.


All are NUMA boxes, but the second one has a higher QPI bandwidth, and the third one features even higher QPI speed (and thus, faster inter-process/node communication).


First of all, we used a smaller dataset for this benchmarks round, since reportedly Dimitri was using 8 tables with 1M rows each (about 2.1GB) for his tests, while in our original tests we used a much larger one, 16 tables with 5M rows each (about 23GB). So it makes sense to check if the dataset size makes any difference.

We first excluded NUMA from the equation. We ran tests on Dell PowerEdge R720 box and varied cpu combinations with taskset. The following chart shows a results comparison on a single-node (on the left) vs. two-node (on the right) configuration.


No surprises here, the general pattern is consistent with what we saw previously. Note that the single-node configuration used 16 threads (8 cores on a single socket x 2 threads per socket), while the second one use 16 cores (8 cores per socket per node). So this test also excludes HT as the culprit for the results difference.

We then compared results across all 3 boxes to see if different hardware has any notable impact:


Still no luck, everything is in line with the other tests. There is some minor variation, e.g. in the first two cases Percona Server performance is even slightly ahead of MySQL 5.6.10, but nothing unexpected.

So while the difference in results still requires an explanation, I’d like to highlight one aspect of the descriptors optimization that I thought was explained in the original post, but seems to cause much confusion.

Even though the read-only transactions optimization in MySQL 5.6 virtually eliminates contention on the mutex protecting the list of transactions, its applicability is quite limited. Namely, it requires the list of transactions to be either empty or very short, which basically implies a fully read-only server. As soon as there is a non-trivial amount of updates to the database, all SELECT queries, even those participating in read-only transactions, start suffering from the trx_list overhead created by concurrent updates.

Once we step away from this spherical read-only cow and throw some data updates into the mix, limitations of read-only transactions become obvious.

In the following test sysbench does 9 primary key SELECTs followed by a PK UPDATE statement in each thread in a loop, which is probably a bit closer to real life workloads. It’s still the AUTOCOMMIT mode, so SELECTs are read-only transactions, but it doesn’t help much, because e.g. for 1024 concurrent threads each SELECT has to scan about 100 update transactions to create a read view:


I hope this sheds more light on our tests setup and clarifies the scope of descriptors optimization.

Server Configuration:

innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_log_buffer_size = 16M
innodb_buffer_pool_size = 52G
innodb_log_file_size = 2000M
innodb_log_files_in_group = 2
innodb_file_per_table = true
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_io_capacity = 2000


To warmup server and load data and indices to the buffer pool we use following queries:

select avg(id) from sbtest$i force key (primary)
select count(*) from sbtest$i WHERE  k like '%0%'



sysbench  --num-threads=<1..1024> --test=oltp.lua --oltp_tables_count=8 --oltp-table-size=1000000
--rand-init=on --report-interval=1 --rand-type=uniform --forced-shutdown=1 --max-time=120
--max-requests=0 --percentile=99 --mysql-user=root --mysql-db=sbtest8t1M
--mysql-table-engine=INNODB --mysql-socket=/tmp/mysql.sock
--oltp-point-selects=1 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0
--oltp-distinct-ranges=0 --oltp-skip-trx=on --oltp-test-mode=nontrx --oltp-read-only=off --oltp-index-updates=0 --oltp-non-index-updates=0 run


sysbench  --num-threads=<1..1024> --test=oltp.lua --oltp_tables_count=8 --oltp-table-size=1000000
--rand-init=on --report-interval=1 --rand-type=uniform --forced-shutdown=1 --max-time=120
--max-requests=0 --percentile=99 --mysql-user=root --mysql-db=sbtest8t1M
--mysql-table-engine=INNODB --mysql-socket=/tmp/mysql.sock
--oltp-point-selects=9 --oltp-simple-ranges=0 --oltp-sum-ranges=0 --oltp-order-ranges=0
--oltp-distinct-ranges=0 --oltp-skip-trx=on --oltp-test-mode=nontrx --oltp-read-only=off --oltp-index-updates=1 --oltp-non-index-updates=0 run

The post More on MySQL transaction descriptors optimization appeared first on MySQL Performance Blog.


Finally. How to verify if all MySQL records were recovered

How to verify if all MySQL records were recoveredAfter nearly every recovery case the same question arises: How many MySQL records were recovered and how many were lost.

Until now there was no way to answer the question without manual investigation. As it turned out a small change can make a big difference.

There are two ways to know how many records an InnoDB page stores. The index page has a header PAGE_N_RECS – this is a number of records the page stores. Obviously it doesn’t count any deleted records. The second method to confirm how many records are in the page is to follow records pointers – and count them.

As you might know, records inside an InnoDB page are organized in an unidirectional list. The list is sorted by primary key and starts with the internal record infinum and ends with another internal record supremum. If you follow record pointers from the infinum and end up with the supremum record then two conclusions can be make:

  1. The page structure is not corrupt
  2. You know how many records to expect

This is what constraints_parser does before it tries to find any records in the page:

./constraints_parser -5f pages-actor/FIL_PAGE_INDEX/0-1599/00000000-00000003.page -V
Checking a page
Infimum offset: 0x63
Supremum offset: 0x70
Next record at offset: 0x7F (127)
Next record at offset: 0xA8 (168)
Next record at offset: 0xCE (206)
Next record at offset: 0x70 (112)
Page is good
Format: COMPACT, Records list: Valid, Expected records: (200 200)

Here you can see “Expected records”: 200 in the header and 200 records were found by pointers.

Then the contraints_parser tries to recover the records. It checks whether fields have permitted values, etc. In the end it knows how many records were actually recovered.

This statistics are printed along with the recovered records:

-- Page id: 3, Format: COMPACT, Records list: Valid, Expected records: (200 200)
000000001917    DB000001720110  actor   1       "PENELOPE"      "GUINESS"       "2006-02-15 04:34:33"
000000001917    DB00000172011B  actor   2       "NICK"  "WAHLBERG"      "2006-02-15 04:34:33"
000000001917    DB000001720126  actor   3       "ED"    "CHASE" "2006-02-15 04:34:33"
000000001917    DB000001720131  actor   4       "JENNIFER"      "DAVIS" "2006-02-15 04:34:33"
000000001917    DB0000017209E4  actor   200     "THORA" "TEMPLE"        "2006-02-15 04:34:33"
-- Page id: 3, Found records: 200, Lost records: NO, Leaf page: YES

From the output above we know that 200 records were expected, 200 – recovered. So, no records were lost.

It’s worth noting that records are stored in the leaf pages only, so if an index has more than one page no records will be found in the root page.

For the root page it will print “Leaf page: NO” and it can be ignored.

The post Finally. How to verify if all MySQL records were recovered appeared first on MySQL Performance Blog.


Percona Server 5.5.30 with TokuDB for MySQL

Percona Server with TokuDB for MySQL

Percona Server with TokuDB

I was excited to see that TokuDB for MySQL from Tokutek is now open source. I believe this is a great technology and it was a significant restriction on the adoption when it was distributed only in a binary format with a proprietary license.

So I went and built Percona Server 5.5.30 with TokuDB’s storage engine. This is an experimental build and I am just going to play with it. The build procedure is somewhat complicated and confusing and this stage, so I want to share it with the entire MySQL community in case you want to try it out for yourselves.

It’s available from our TESTING download area. The source code used to build this binary is in the same directory.

The post Percona Server 5.5.30 with TokuDB for MySQL appeared first on MySQL Performance Blog.


Percona Server 5.5.30-30.2 rerelease fixes non-restart issue

Percona Server for MySQLIn our last 5.5 series release of Percona Server, we included a regression in the RPM packaging that prevented the server from restarting following an upgrade — instead, the server would remain stopped after the upgrade was completed regardless of its state before updating. This caused some problems for some users, especially if automatic upgrading was configured on machines with running database servers.

We’ve now fixed it and re-released to our downloads area and our RPM software repositories. Please note that this re-release only affects RPM so our dpkg and binary packages remain unchanged.

Percona XtraDB Cluster was also affected by this bug and will also have a re-release very soon.

Please refer to the bug #1168032 description for more details. Thanks to Mark McKinstry for reporting the bug and to David Six at Percona Live MySQL Conference and Expo for walking through this.

– More info on Percona Server 5.5.30-30.2 –

New Features:

  • Percona Server for MySQL has implemented priority connection scheduling for the Thread Pool. (Alexey Kopytov)
  • Percona Server for MySQL will now be shipped with the libjemalloc library. Benchmark showing the impact of memory allocators on MySQL performance can be found in this blogpost. (Ignacio Nin)
  • This release of Percona Server for MySQL has fixed a number of performance bugs. (Alexey Kopytov)
  • Drop table performance feature has been removed and its controlling variable innodb_lazy_drop_table has been deprecated. Feature has been removed because the upstream DROP TABLE implementation has been improved. (Laurynas Biveinis)

Bugs Fixed:

  • Due to parse error in the percona-server.spec Percona Server rpm packages couldn’t be built on RHEL 5 and CentOS 5. Bug fixed #1144777 (Ignacio Nin).
  • When mysqldump was used with --innodb-optimize-keys option it produced invalid SQL for cases when there was an explicitly named foreign key constraint which implied an implicit secondary index with the same name. Fixed by detecting such cases and omitting the corresponding secondary keys from deferred key creation optimization. Bug fixed #1081016 (Alexey Kopytov).
  • Percona Server for MySQL was built with YaSSL which could cause some of the programs that use it to crash. Fixed by building packages with OpenSSL support rather than the bundled YaSSL library. Bug fixed #1104977 (Ignacio Nin).
  • Running the DDL statement while variable innodb_lazy_drop_table was enabled could cause assertion failure. Bugs fixed #1086227 and #1128848 (Laurynas Biveinis).
  • Fixed yum dependencies that were causing conflicts in CentOS 6.3 during installation. Bugs fixed #1031427 and #1051874 (Ignacio Nin).
  • The log tracker thread was unaware of the situation when the oldest untracked log records are overwritten by the new log data. In some corner cases this could lead to assertion errors in the log parser or bad changed page data. Bug fixed #1108613 (Laurynas Biveinis).
  • Ported a fix from MariaDB for the upstream bug #63144. CREATE TABLE or CREATE TABLE IF NOT EXISTS statements on an existing table could wait on a metadata lock instead of failing or returning immediately if there is a transaction that executed a query which opened that table. Bug fixed #1127008 (Sergei Glushchenko).
  • Fix for bug #1070856 introduced a regression in Percona Server for MySQL 5.5.28-29.3 which could cause a server to hang when binary log is enabled. Bug fixed #1162085 (Alexey Kopytov).
  • Fixed upstream bug #49169 by avoiding the malloc call in the read_view_create_low() in most cases. This significantly improves InnoDB scalability on read-only workloads, especially when the default glibc memory allocator is used. Bug fixed #1131187 (Alexey Kopytov).
  • Removed trx_list scan in read_view_open_now() which is another problem originally reported as upstream bug #49169. This also provides much better scalability in InnoDB high-concurrent workloads. Bugs fixed #1131189 (Alexey Kopytov).
  • In the event that a slave was disconnected from the master, under certain conditions, upon reconnect, it would report that it received a packet larger than the slave_max_allowed_packet variable. Bug fixed #1135097 (George Ormond Lorch III).
  • Fixed the upstream bug #62578 which caused MySQL client to abort the connections on terminal resize. Bug fixed #925343 (Sergei Glushchenko).
  • Percona Server for MySQL would re-create the test database when using rpm on server upgrade, even if the database was previously removed. Bug fixed #710799 (Alexey Bychko).
  • Debian packages included the old version of innotop. Fixed by removing innotop and its InnoDBParser Perl package from source and Debian installation. Bug fixed #1032139 (Alexey Bychko).
  • UDF/configure.ac was incompatible with automake 1.12. Bug fixed #1099387 (Alexey Bychko).
  • Reduced the overhead from innodb_pass_corrupt_table value checks by optimizing them for better CPU branch prediction. Bug fixed #1125248 (Alexey Kopytov).
  • dialog.so used by the PAM Authentication Plugin couldn’t be loaded with Perl and Python clients when plugin-dir option was set in the [client] section of the my.cnf. Bug fixed #1155859 (Sergei Glushchenko).
  • Fixed the upstream bug #68845 which could unnecessarily increase contention on log_sys->mutex in write-intensive workloads. Bug fixed #1163439 (Alexey Kopytov).
  • Ported back from the upstream MySQL 5.6 the fix for unnecessary log_flush_order_mutex acquisition. Bug fixed #1163262 (Alexey Kopytov).
  • When mysqldump was used with --innodb-optimize-keys and --no-data options, all secondary key definitions would be lost. Bug fixed #989253 (Alexey Kopytov).
  • Warning about the Percona Toolkit UDFs was omitted when installing from Percona’s Debian repositories. Bug fixed #1015506 (Alexey Bychko).
  • Percona Server for MySQL was missing help texts in the MySQL client because the help tables were missing. Bug fixed #1041981 (Alexey Bychko).
  • Fixed the upstream bug #68197 that caused InnoDB to misclassify internal read operations as synchronous when they were actually asynchronous when Thread Pool feature was used. Bug fixed #1107539 (Sergei Glushchenko).
  • Suboptimal code for User Statistics feature has been optimized to make sure no additional work is done when userstat is disabled. Bug fixed #1128066 (Alexey Kopytov).

Other bug fixes: bug fixed #1146621 (Laurynas Biveinis), bug fixed #1050536 (Alexey Bychko), bug fixed #1144059 (Roel Van de Paar), bug fixed #1154962 (Hrvoje Matijakovic), bug fixed #1154959 (Hrvoje Matijakovic), bug fixed #1154957 (Hrvoje Matijakovic), bug fixed #1154954 (Hrvoje Matijakovic).

Release notes for Percona Server for MySQL 5.5.30-30.2 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

UPDATE[18-04-2013]: There was a RPM packaging regression introduced with the fix for bug #710799. This regression only affected clean RPM installations and not upgrades. We have pushed the fixed packages to the repositories. Bug fixed #1170024.

The post Percona Server 5.5.30-30.2 rerelease fixes non-restart issue appeared first on MySQL Performance Blog.


Real Ocean of Dust in Australia

I recently came across a fascinating post in The Mirror newspaper in the UK. It shows a dust storm raging across the ocean. Heh, not really an ocean of dust, but indicates how terrifying the dust storms must be on my world in my book, Ocean of Dust. Imagine being on a ship with this thing bearing down on you.   Full article

Photo: Brett Martin, Getty


Percona Monitoring Plugins 1.0.3 for MySQL now available

Percona is glad to announce the release of Percona Monitoring Plugins 1.0.3 for MySQL — high-quality components to add enterprise-grade MySQL monitoring and graphing capabilities to your existing in-house, on-premises monitoring solutions. The components are designed to integrate seamlessly with widely deployed solutions such as Nagios and Cacti, and are delivered in the form of templates, plugins, and scripts.



* MySQL 5.6 compatibility for InnoDB graphs (bug 1124292)
* Added performance data to Nagios plugins (bugs 1090145, 1102687)
* Added UTC option to pmp-check-mysql-replication-delay to be compatible with pt-hearbeat 2.1.8+ (bug 1103364)
* Added 1-second granularity to pmp-check-mysql-deadlocks (bug 1154774)
* Added package install/update instructions and other documentation updates (bugs 1139652, 1124200, 1015981)
* Updated documentation with the new Cacti sample images
* Updated “Network Traffic” to be blue and green and to show bits/sec (bug 1132900)
* Extended “MySQL Threads” graph with all kind of threads (bug 1157911)
* Some Cacti single-item graphs were broken due to cacti hexadecimal transformation (bug 1155513)
* Memcached graphs were broken when the wrong arguments for nc command are passed (bug 1155712)
* ss_get_by_ssh.php didn’t gather mongodb stats without SSH (bug 1050537)
* ss_get_by_ssh.php didn’t timeout commands that hang (bug 1160611)
* pmp-check-file-privs didn’t throw the proper error on directory permissions issue (bug 1024001)
* pmp-check-mysql-replication-running reported OK when a slave is in “Connecting” state (bug 1089506)

Update note: Cacti templates have to be re-imported together with the updating of ss_get_*.php scripts. Then make sure you rebuilt the poller cache under Cacti -> System Utilities. Also the following Cacti graphs need to be recreated: MySQL “MySQL Threads”, Linux “Network Traffic”.

A new tarball is available from downloads area or RPM and DEB packages from our software repositories. The plugins are fully supported for customers with a Percona Support contract and free installation services are provided as part of some contracts. In addition as part of Percona’s Remote DBA installation and setup of these tools are included with our services. You can find links to the documentation, forums and more at the project homepage.

The post Percona Monitoring Plugins 1.0.3 for MySQL now available appeared first on MySQL Performance Blog.


The two top performance problems caused by ORM tools

ORMs are not entirely useless…” I just tweeted in response to a not exactly constructive message that we should fire developers who want to use ORMs. But than I continued in a not exactly constructive tone myself when I wrote "…the problem is that the ORM authors don’t know anything about database performance”.

Well, I don’t think they “don’t know anything” but I wonder why they don’t provide decent solutions (including docs) for the two most striking performance problems caused by ORMs? Here they are:

The infamous N+1 selects problem

This problem is actually well-known and I believe there are working solutions in all recent ORMs. The problem that remains is about documentation and culture: although there are solutions, many developers are not aware of them or still live in the “joins are slow—let’s avoid them” universe. The N+1 selects problem seems to be on the decline but I still think ORMs’ documentation should put more emphasis on joins.

For me, it looks very similar to the SQL injection topic: each and every database access layer provides bind parameters but the documentation and books just show examples using literal values. The result: SQL injection was the most dangerous weakness in the CWE/SANS Top 25 list. Not because the tools don’t provide proper ways to close that hole, but because the examples don’t use them consistently.

The hardly-known Index-Only Scan

Although the Index-Only Scan is one of the most powerful SQL tuning techniques, it seems to be hardly known by developers (related SO question I was involved in recently). However, I’ll try to make the story short.

Whenever a database uses an index to find the requested data quickly, it can also use the information in the index to deliver the queried data itself—if the queried data is available in the index. Example:

     ( last_name  VARCHAR(255)
     , first_name VARCHAR(255)
     -- more columns and constraints

    ON demo (last_name, first_name);

SELECT last_name, first_name
  FROM demo
 WHERE last_name = ?;

If the database uses the IOS_DEMO index to find the rows in question, it can directly use the first name that is stored along with the last name in the index and deliver the queries’ result right away without accessing the actual table. That saves a lot of IO—especially when you are selecting more than a few rows. This technique is even more useful (important) for databases that use clustered indexes like SQL Server or MySQL+InnoDB because they have an extra level of indirection between the index and the table data.

Did you see the crucial prerequisite to make an Index-Only Scan happen? Or asking the other way around: what’s a good way to make sure you’ll never get an Index-Only Scan? Yes, selecting all columns is the most easy yet effective Index-Only-Scan-Preventer. And now guess who is selecting all the columns all the time on behalf of you? Your good old friend the ORM-tool.

This is where the tool support is really getting sparse. Selecting partial objects is hardly supported by ORMs. If it is supported then often in a very inconvenient way that doesn’t give runtime control over the columns you’d like to select. And for gods sake, let’s forget about the documentation for a moment. To say it straight: using Index-Only Scans is a pain in the a** with most ORMs.

Besides Index-Only Scans, not selecting everything can also improve sorting, grouping and join performance because the database can save memory that way.

What we would actually need to get decent database performance is a way to declare which information we are going to need in the upcoming unit of work. Malicious tongues might now say “that’s SQL!” And it’s true (in my humble opinion). However, I do acknowledge that ORMs reduce boilerplate code. Luckily they often offer an easier language than SQL: their whatever-QL (like HQL). Although the syntactic difference between SQL and whatever-QL is often small, the semantic difference is huge because they don’t work on tables and columns but on objects and classes. That avoids a lot of typing and feels more natural to developers from the object world. Of course, the whatever-QL needs to support everything we need—also partial objects like in this Doctrine example.

After all, I think ORM documentation should be more like this: first introduce whatever-QL as simplified SQL dialect that is the default way to query the database. Those methods that are currently mentioned first (e.g., .find or .byId) should be explained as a shortcut if you really need only one row from a single table.


High-load problems? Investigate them with ‘pt-query-digest’

Percona Toolkit for MySQLI had the chance to work on an interesting case last week, and I thought I’d share what I think is a little known goodie from Percona Toolkit for MySQL called pt-query-digest.

One customer was suffering from periods of high load on their database server, leading to degraded application performance, and sometimes even short moments of downtime. As usual with these cases, we set up pt-stalk to trigger a capture if Threads_running went above a known ‘good’ value for this workload, so we would have lots of diagnostics information to review even if we couldn’t be logged in to the server at the time the problem happened.

As I began reviewing available captures, I was able to correlate most load peaks with temporary increases in user activity, so it became a matter of figuring out if the server was hitting its capacity, or if there was another limiting factor in the workload that made it scale poorly. This is when the ‘SHOW FULL PROCESSLIST’ captures generated by pt-stalk come in handy, but if you’ve used them, you know it can be a daunting task to review them in servers with thousands of connections.

Enter pt-query-digest. The tool supports its share of format parsers, but the one that saved the day here is also the simplest: the ‘raw’ log parser. If you specify –format rawlog, then the tool expects the input to have one query per line, and it will do it’s aggregation job on that. Granted, this means there is no run time data available, but at least you get a nice ranking of queries sorted by the number of times they appear in processlist.

In this case, I extracted the first snapshot from each *-processlist capture (under normal circumstances, this file will have –run-time snapshots separated by timestamps) and ran that by pt-query-digest –type rawlog –report-format profile, to get just the ranking of queries. To my luck, one query stood out right away, always appearing many times more than any other one during peaks. Rewriting that query (actually the customer removed it altogether, by merging it with another query that ended up having a much better execution plan, but that is beyond the scope of this post!) made the problem go away. The server took the next round of peak traffic time without issues.

If you regularly work with MySQL databases, getting familiar with Percona Toolkit for MySQL will let you do more work, in a safer way, and in less time. If you need to get started, there are good recorded webinars on the subject, and the upcoming Percona Live conference’s tutorials and sessions will include a wealth on information on everything MySQL-related, including Percona Toolkit.

The post High-load problems? Investigate them with ‘pt-query-digest’ appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com