How Percona strives to remain neutral and independent

Many of the prominent companies in the MySQL ecosystem are Percona customers, including hardware manufacturers, software developers, hosted service providers, and appliance developers. We perform paid and unpaid research on their products, and we publish blog posts related to their products or services. Independence and objectivity are core Percona values. How do we balance the interests of all involved while maintaining our vendor-neutral stance? This is a very important topic for all of our current and potential customers to understand. It is essential that we set realistic expectations about how we’ll work with you, so that we don’t enter into a business relationship based on assumptions that could cause conflict.

The short version of the following lengthy blog post is that vendors can buy Percona’s time, but not our opinion or the use of our name.

We can separate customers into two main groups of interest for this blog post: customers who might be interested in a product or service that another customer provides, and customers who might be interested in selling a product or service to other Percona customers. We will call these groups “vendors,” and variously “consumers” or “prospects” at times for brevity.

Consumers: Percona customers needing products or services

Customers who need a product or a service can expect that Percona will make recommendations with as little bias as possible (but see also Peter’s post on bias) based on our knowledge of the product, regardless of whether the vendor is a Percona customer. For example, Virident worked with Percona to test and validate their PCI-E flash cards, and Fusion-IO sent us free cards to use as we wished. If we believe that a customer will benefit from these storage solutions, we might mention them, but we have no obligation to do so. We will reveal that the vendor is a Percona customer, and that we performed paid evaluations or that we were given free hardware to test.

If another vendor brings a good solution to market and isn’t a Percona client, we’ll also mention that product if we’re aware of it. We try to stay abreast of the latest technologies, but good research takes money and time, so we can’t always know about every product or service. We are naturally more likely to recommend products that we know well, and there should be nothing wrong with that as long as we disclose fully. If you ask us about solution X and Y, and we say “I know X well, and it’s a good fit for you, but solution Y is not something I’ve tested,” then you can either take that advice as it is, or pay us to research solution Y.

It sometimes happens that a vendor finds out that Percona also works with a potential consumer of their product or service. This happens through the grapevine, or through direct interaction between the vendor and the consumer, e.g. a sales call. However, we do not play a matchmaker role for its own sake, and we do not make our customer list available to any other customers. We will not take the initiative to tell any vendor to contact you. If we think you’re a good candidate for a vendor’s solution, we’ll offer you an introduction if that’s the best course of action, or just work with you to assess the solution without any interaction with the vendor, if that is not necessary to give you the best service. In summary, you get the best service, and we’ll suggest an introduction only if it’s in your interests, not merely because it’s in the vendor’s interests.

Vendors: Percona customers with something to sell

If you’re a solution vendor, there are several benefits of hiring Percona to assess your product. First, Percona can become intimately familiar with it, which can enable us to suggest improvements both to the product and to your marketing or positioning, to better match the true needs of your potential clients. It can also enable us to give unbiased advice to our other clients. Finally, if you hire us to research your offering, we can mention the results in outlets such as blog articles, webinars, white papers, and conference speeches. This has proven to be very valuable marketing exposure for many customers.

The level of detail that you expose to us determines how knowledgeable we are about your product. If you’re selling something closed-source and don’t want us to see the source, that’s fine, but it may mean that we aren’t as confident about our knowledge of the product. The more we know, the more valuable we can be to you and to our clients who might purchase the solution.

Being a Percona customer does not grant you any editorial influence over what we publish. For example, we can publish blog articles about your solution independently of our work with you. If you’ve hired us to research your solution, you can expect that we will publish only informed opinions on the solution. But to be clear, that is nothing more than non-customers can expect. Our editorial standard for our blog is that all articles are informed opinions. It is reasonable to expect that a qualified expert at Percona will review anything that is published. That is the norm for our blog. If we make mistakes, we welcome your corrections, but we are not obligated to get your approval before publishing (although we often extend the courtesy of sending drafts for review before publishing).

Being a Percona customer does not entitle vendors to purchase marketing exposure from us, but we will agree on an intent to publish the results of product research. The distinction is subtle, but important to understand. The deliverable in the contract will not be “publish a blog post” or “publish a white paper.” The deliverable will be to produce something publishable that meets our standards for balanced and objective reporting. However, Percona does the writing, not the vendor. Sometimes vendors suggest wording, but we decide whether to use it. When we are finished researching, and we have written our conclusions, we offer the draft to the vendor and invite any corrections. We then offer the vendor the option to publish in its entirety, uncensored. The vendor has the right to approve or refuse the publication in its entirety. If we found something negative and we want to publish that, vendors can’t ask us to remove that and publish the positive results.

Although we insist on editorial freedom, that does not mean that we operate with blinders on; that could be disrespectful to customers who pay for our research. When we work on paid research with the intent to publish, we try to be circumspect about anything else we write on the specific topic of research, and we try to err on the side of being too courteous to our customers if there is uncertainty.

If you hire us to review a product with a specific outcome in mind, then that will naturally influence the results of our research and subsequent publication. For example, suppose you are designing a product to perform very well on a specific benchmark, and you want us to validate that it does. When we publish the results, we will reveal that the pre-existing agenda was to validate this benchmark, and we will also try to provide a balanced assessment overall of the product and its suitability for various use cases, including all known downsides. For example, if your product does extremely well on TCP-H benchmark queries and crashes on other queries, we will reveal that to provide a balanced review of it. When feasible, we will try to evaluate the product in a way that protects us from bias and helps us be more thorough, such as being a “mystery shopper” as well as examining the product or service that the you are aware we’re inspecting.

There have been times when vendors have told us not to publish the results of our research. You may ask whether this is censorship in itself, and whether lack of disclosure creates an unbalanced and unfair situation — lying by omission, so to speak. The answer is that vendors might not be ready with their product. We aren’t here to throw tomatoes at unfinished solutions. Vendors who hire us to do research but then discover that the result isn’t as positive as they desire will often come back to us later, after fixing the issues we identified. Sometimes the outcome is more positive the second time around, and is something they are comfortable with letting us publish.

When we publish the results of paid research, we disclose that we were paid to do it. Gifts, such as FusionIO’s gift of hardware, follow the same rules. This transparency assures readers that when something isn’t mentioned as paid research, then it wasn’t paid for, and there is no puppetry behind the scenes. Examples of customers whose paid research resulted in a blog post or white paper are Virident and Tokutek. We try to ensure that our paid research is just as objective as unpaid research, but it is still necessary to disclose the paid nature of the work. This means that if you want to hire Percona to evaluate your solution, and hopefully publish an opinion about it, then you must be willing to disclose that you are a Percona customer.

Use of Percona’s name by vendors

We do not authorize vendors to use Percona’s name or reputation as an endorsement unless our executive team explicitly approves of it on a case-by-case basis. For example, suppose that you develop a proposed application architecture for a sales prospect (who may or may not be a Percona client). You hire Percona to review the architecture. We do not authorize you to mention our name in connection with the proposed architecture, unless we first approve. We will also require you to disclose to the client our full, uncensored report on the proposed architecture. For example, we will not authorize you to say “we developed this proposed architecture and had Percona review it” without saying anything more. The client is entitled to receive Percona’s review of the architecture without having to ask for it. If we do not think the proposed solution is a good idea for the prospect to implement, and you are not comfortable with our report, then we may not allow you to mention our name at all.

If a vendor wants to use the results of Percona’s research in part, but not perform the full and balanced disclosure that Percona would perform, then that is acceptable as long as our name is not revealed. To revisit the TCP-H example, a vendor whose product ran well on that benchmark but crashed on other simple queries is welcome to quote the benchmark results without mentioning the crashes, as long as they don’t say Percona’s name. The vendor can buy the benchmark, but not our name.

If we become aware of a vendor using Percona’s name as an endorsement in a way we have not authorized, then depending on the circumstances we may decide that we are obligated to respond, perhaps by contacting the person or organization who has been given misleading or incomplete information about Percona’s involvement. If you are considering hiring Percona to review a proposal, you might want to avoid disclosing our involvement until you see the results of our review. If the prospect knows that we are reviewing the proposal and you decline to reveal the results, then there could be an implied endorsement, and we may feel obligated to make the review available to the client directly. Our willingness to protect the integrity of our name through such direct action serves as a strong deterrent to most misuse.

If you are the sales prospect, and the vendor recommends a solution to you and says that Percona endorses it, then you are welcome to contact us and verify that claim, and clarify any details such as the extent of our involvement.

When vendors ask to involve us in the sales process

Vendors often ask us to work with them in the sales process. For example, they might ask us to participate in a product evaluation so the prospective client can decide whether the solution meets their needs. We are happy to do this, provided that it’s clear that our involvement doesn’t in itself constitute an endorsement, as per the previous section on use of our name.

When a vendor wants us to evaluate their solution for a prospective client, we generally bill them for all of the time that we spend on their behalf. However, if the prospect says “I’d also like you to compare the vendor’s solution to something else,” and that isn’t necessary to do a balanced evaluation, then we will bill the prospective client for that, not the vendor. If we think that this extra work is required for us to correctly assess the solution, then we will tell the vendor and bill them for the work. In other words, vendors can’t hire us to do a partial or inconclusive evaluation; we require that the evaluation is full and thorough, and we bill the vendor for that.

On the other hand, if the consumer requests us to do the evaluation, without the vendor’s involvement, then we bill the consumer for the work. If both of them request it, that might be a gray area we deal with on a case-by-case basis. In reality, this is rare. Most vendors have active sales efforts, so most evaluations are paid for by the vendors.

After the evaluation is over, we may indeed endorse the solution, but to the extent we can, we try to ensure that our endorsement is not influenced by any of the above. It will always represent our best effort at an objective opinion on the technical merits of the solution, and the suitability for the business use case. (Again, sometimes bias is unavoidable.)

Vendors who approach us to be directly involved with an evaluation should understand that it is possible that we’ll recommend against the solution, and even that the sales prospect might hire Percona instead to pursue an alternative solution. We do not enter into agreements that restrict our ability to help the prospect solve their problems in the best way possible. As such, we generally do not agree to exclusivity agreements or do-not-compete agreements with vendors. If we did, it would prevent us from offering the best possible service to all involved.

As with product research or evaluations intended for a general audience, we are fine with vendors hiring us to validate a certain result for a specific sales prospect, even if it’s not what we consider complete and balanced. As long as Percona’s name is not mentioned, the vendor is free to cherry-pick from the results they disclose to the prospect. We are also okay with vendors keeping us at arm’s length from the prospect and preventing them from ever knowing that Percona was involved in the sales cycle, or even with vendors hiring us to assess something for a prospect whose identity they don’t disclose to us, again provided that our name is not used as a sales tool. We will not endorse a solution for an unknown party, because we are not able to provide a balanced assessment of its suitability for them.

Sometimes when a vendor asks if we want to be involved with an effort to sell to a prospect, they will use a phrase such as “this could be a win-win deal for both of us.” We request that it be win-win-win for all three of us. Most vendors happily adjust their approach to try to make the prospect’s interests of equal importance when requested. We try to ensure that this is the case, and we may recuse ourselves if we have objections that are not addressed.

Sometimes vendors who want Percona to create a favorable outcome for them will offer something in return, or imply that they’ll withhold something if we don’t participate. For example, we were once offered future referrals of services in exchange for endorsing a sales proposal. When we declined, the vendor told us that this was a very large number of potential clients and they would not refer anyone to us if we did not have a change of heart. We think that providing the best possible service includes making referrals to providers whose expertise and ethics we trust, without expecting anything in return, when we aren’t the best fit. For example, we frequently refer people to expert PostgreSQL consultants when needed. We believe that people who need Percona’s services will find us without referrals, and we are not willing to sacrifice our integrity for a short-term gain.

Sometimes we have also been told not to involve ourselves in some aspects of the proposal we are evaluating, and asked to focus only on specific aspects, without advising on the broader picture, such as whether it’s good for the prospect’s business. This is equivalent to an editorial restriction, and is not something we can accept if our name is disclosed to the prospect. It is our job to counsel on the solution, not just an aspect of the solution in isolation. We are ethically and legally obligated to do so.

Partnership relationships

In general, we do not enter into the usual type of partner relationships, and no partnership status or obligation is implied by hiring Percona. We have had many companies approach us with a partnership in mind, but the standard partnership relationship does not usually mesh well with the values discussed previously in this blog post. A partnership relationship usually implies some type of preferential treatment, such as an expectation that Percona would promote the partner’s product or service above others that might be suitable for a customer. That would compromise our core values. Most offers of partnerships either turn into paid evaluations on a consulting basis, or we reach a mutual understanding that it isn’t a good match and we don’t form any formal relationship at all.

There are also vendors whom we know or have worked with, and we have no formal partnership or other relationship, but we have what you could call an “informational” partnership. For example, we sometimes seek out or respond to requests from vendors who want to get to know us and vice versa. Sometimes we invite vendors to speak to our consulting team on our periodic conference calls, to present their solution and enter into a technical dialogue with a large team of people, which is not only fun but creates very educational conversations. This has proven to be a good way to gain at least surface-level familiarity with technologies that we might not have looked into otherwise. We always make recommendations and suggestions based on what we know, and never because we’re paid to; and this is equally applicable to such friendship types of relationships.


I hope that the above discussion provides some insight into how we strive to remain independent and objective, making informed decisions and recommendations as fairly as possible, placing everyone’s interests on an equal basis as much as we can, and ensuring that the integrity of our name and work is uncompromised. If you have any questions or comments, I invite you to use the comment form, or if it’s not appropriate to discuss in public, then I and the entire Percona executive team would welcome a private conversation about this. You can contact us through our contact form and it will be routed to the correct people.


Virtualization and IO Modes = Extra Complexity

It has taken a years to get a proper integration between operating system kernel, device driver and hardware to get behavior with caches and IO modes correctly. I remember us having a lot of troubles with fsync() not flushing hard drive write cache and so potential hard drives can be lost on power failure. Happily most of these are resolved now with “real hardware” and I’m pretty confident running Innodb with both default (fsync based) or O_DIRECT innodb_flush_method. Virtualization however adds yet another layer and we need to question again whenever IO really durable in virtualized environments. My simple testing shows this may not always be the case

I’m comparing O_DIRECT and fsync() single page writes to 1MB file using SysBench on Ubuntu, ext4 running on VirtualBox 4.0.4 running on Windows 7 on my desktop computer with pair of 7200 RPM hard drives in RAID1. Because there is no write cache I expect it to do no more than a bit over 100 writes per second as even in case there is no disk seek we need to wait for disk head to make a full round to do a rotation. I’m however getting rather bizarre results:

Using fsync()

pz@ubuntu:~/test$ sysbench --num-threads=1 --test=fileio --file-num=1 --file-test-mode=rndwr  --file-total-size=1M --max-requests=10000000 --max-time=60 --file-fsync-freq=1 run
sysbench 0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
1 files, 1Mb each
1Mb total file size
Block size 16Kb
Number of random requests for random IO: 10000000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random write test
Threads started!
Time limit exceeded, exiting...

Operations performed:  0 Read, 1343 Write, 1343 Other = 2686 Total
Read 0b  Written 20.984Mb  Total transferred 20.984Mb  (357.62Kb/sec)
   22.35 Requests/sec executed

Test execution summary:
    total time:                          60.0863s
    total number of events:              1343
    total time taken by event execution: 0.0808
    per-request statistics:
         min:                                  0.04ms
         avg:                                  0.06ms
         max:                                  0.34ms
         approx.  95 percentile:               0.06ms

Threads fairness:
    events (avg/stddev):           1343.0000/0.00
    execution time (avg/stddev):   0.0808/0.00

Ignore response times here as it times only writes not fsync() calls…. 22 fsync requests per second is pretty bad though I assume It can be realistic with overhead.

Now lest see how it looks using O_DIRECT

pz@ubuntu:~/test$ sysbench --num-threads=1 --test=fileio --file-num=1 --file-test-mode=rndwr --file-extra-flags=direct --file-total-size=1M --max-requests=10000000 --max-time=60 run
sysbench 0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 16384
1 files, 1Mb each
1Mb total file size
Block size 16Kb
Number of random requests for random IO: 10000000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random write test
Threads started!
Time limit exceeded, exiting...

Operations performed:  0 Read, 33900 Write, 339 Other = 34239 Total
Read 0b  Written 529.69Mb  Total transferred 529.69Mb  (8.8278Mb/sec)
  564.98 Requests/sec executed

Test execution summary:
    total time:                          60.0019s
    total number of events:              33900
    total time taken by event execution: 37.5364
    per-request statistics:
         min:                                  0.10ms
         avg:                                  1.11ms
         max:                                259.69ms
         approx.  95 percentile:               5.31ms

Threads fairness:
    events (avg/stddev):           33900.0000/0.00
    execution time (avg/stddev):   37.5364/0.00

I would expect rather similar results to the test with fsync() while we’re getting numbers 20 times better… and surely too good to be true. Meaning I can be sure the system is lying about write completion if we’re using O_DIRECT IO

What is my take away on this ? I did not have a time to research whenever the problem is related to VirtualBox or some configuration issue. Things may be working correctly in your case. The point is Virtualization adds complexity and there are at least some cases when you may be lied to about IO completion, so if you’re relying on system to be able to recover from power failure or VM crash make sure to test it carefully.


What Causes Downtime in MySQL?

We’ve just published a new white paper analyzing the causes of emergency incidents filed by our customers. The numbers contradict the urban myth that bad SQL is the most common problem in databases. There are a number of surprises in other areas, too, such as the causes of data loss. This is the companion to my earlier white paper suggesting ways to prevent emergencies in MySQL. It is a re-published and re-edited version of an article that just appeared in IOUG’s SELECT magazine. You can download it for free from the MySQL white papers page on the Percona web site.


Choosing an appropriate benchmark length

The duration of a benchmark is an important factor that helps determine how meaningful it is. Most systems have some “burstable capacity,” and this can influence the results a lot. You can see this in all areas of life — you can sprint much faster than you can run a 10k race. Your stereo system components are usually advertised in both peak and sustained output. Transducers can generally hit peaks that would melt them due to heat dispersion challenges if run at that level long-term. Database servers are no different. Many components in the system have the capacity to absorb peaks. But buffers eventually fill if pressured for a long time.

When designing a benchmark, you should think about what type of performance characteristics you are looking for in your production system. If you want a system that can handle peak loads that don’t last very long, then measuring burstable capacity with a short benchmark might be okay. But if you want to measure how the system will perform over a long time with a sustained load, then you need to run your benchmark for a long time.

This can be costly and time-consuming. If your cycle time is 8 hours or more, this can be frustrating, too. If you don’t time it right, you might only be able to fit in one or two benchmarks a day. Vadim runs a lot of long-term benchmarks on MySQL and Percona Server. He is a very patient man. Mark Callaghan has run benchmarks that last for months.

Sometimes you don’t know how long your benchmark should run until you try it. This was the case in a recent benchmark I ran. The following image shows the system’s IO behavior:

As you can see, the reads settled down after only 3 hours or so, but writes continued to climb until at least 8 hours. How long should I run this benchmark? In general, to understand the long-term performance, it should run at least twice as long as it takes for the system to settle in and appear fully warmed up. At that point, you should examine the preliminary graphs and see; if there is some unexplained variation, you should continue to run until you have determined that the system is behaving according to its long-term pattern. This might have cyclical variations. What is that notch near the right-hand side of the graph? Is that the beginning of a repeating pattern, a one-time event, or something else? There is only one way to tell: keep running the benchmark. I ended up running the above benchmark for 72 hours to ensure that it was exhibiting its typical long-term behavior.


Percona Server and XtraBackup weekly news, March 19th

Here’s what is new in Percona Server and Percona XtraBackup since last week. We are working on compiling and checking download statistics for both pieces of software. The preliminary numbers are far higher than I thought they would be. Both Percona Server and XtraBackup are becoming amazingly popular. I am aware of deployments running into multiple thousands of instances of Percona Server in some companies.

  • Laurynas Biveinis joined the Percona Server project as a developer.
  • We continued working on our upcoming Percona Server 5.5 stable release. Vadim has also spent a lot of time benchmarking against MySQL 5.5. Sorry, I can’t reveal how good the numbers are until the benchmarks are fully validated. Watch this blog for Vadim’s announcement.
  • We began investigating the effort required to port Percona Server to Windows. We are interested in knowing how many people using MySQL on Windows would be ready to pay for us to support Percona Server on Windows. If you are, you can contact our Sales team to discuss.
  • We have changed our Debian/Ubuntu packages to build with readline, not editline. This has been a problem with a lot of Debian/Ubuntu software. This fix will be released in the next version of Percona Server.
  • We improved the performance and reliability of using shared memory for the InnoDB buffer pool.
  • We updated Percona.TV, our companion site for MySQL videos, screencasts, and webinars. A lot of new Percona-Server-related material is available there now, and very shortly our webinars will also be there, so you won’t have to hassle with Webex or GoToMeeting’s obnoxious system requirements just to watch a recorded webinar.

In XtraBackup news,

  • XtraBackup has a new logo, shown below. We stayed with the cat theme to match Percona Server’s logo.
  • We’re continuing to work on support for 5.5, and testing and finishing the new features parallel backups, streaming, and compression inside the xtrabackup binary instead of the innobackupex script.
  • We moved the target milestone from version 1.6 to 1.7 for supporting XtraBackup on Windows. If you would like it sooner, again we can make it happen with your financial support. Please contact Sales to discuss.
Percona XtraBackup Logo

Percona XtraBackup Logo


Video: The InnoDB Storage Engine for MySQL

(This is a cross post from – the home of percona material in video form.)

Last month I gave a presentation at the PHP UK Conference on the InnoDB storage engine.  I was a last minute speaker, and I want to thank them for the time-slot and their hospitality at short notice.

The video has been posted online:

The InnoDB Storage Engine for MySQL – Morgan Tocker from PHP UK Conference on Vimeo.

It relates to InnoDB built-in and InnoDB plugin.  I left out Percona Server and XtraDB for simplicity.

If you want to learn more about this topic, I suggest taking a look at our full day course on InnoDB/XtraDB, the talk “Introduction to the InnoDB Storage Engine for MySQL” (Morgan Tocker) at Collaborate, “Innodb and XtraDB Architecture and Performance Optimization” (Peter Zaitsev) at the MySQL conference.


Where does HandlerSocket really save you time?

HandlerSocket has really generated a lot of interest because of the dual promises of ease-of-use and blazing-fast performance. The performance comes from eliminating CPU consumption. Akira Higuchi’s HandlerSocket presentation from a couple of months back had some really good profile results for libmysql versus libhsclient (starting at slide 15). Somebody in the audience at Percona Live asked about the profile results when using prepared statements and I’m just getting around to publishing the numbers now; I’ll reproduce the original numbers here, for reference:

libmysql (Akira’s Numbers)
samples % symbol name
748022 7.7355 MYSQLParse(void*)
219702 2.2720 my_pthread_fastmutex_lock
205606 2.1262 make_join_statistics(…)
198234 2.0500 btr_search_guess_on_hash
180731 1.8690 JOIN::optimize()
177120 1.8317 row_search_for_mysql
171185 1.7703 lex_one_token(void*,void*)
162683 1.6824 alloc_root
131823 1.3632 read_view_open_now
122795 1.2699 mysql_select(…)

– Parsing SQL is slow

HandlerSocket (Akira’s Numbers)
samples % symbol name
119684 14.7394 btr_search_guess_on_hash
58202 7.1678 row_search_for_mysql
46946 5.7815 mutex_delay
38617 4.7558 my_pthread_fastmutex_lock
37707 4.6437 buf_page_get_known_nowait
36528 4.4985 rec_get_offsets_func
34625 4.2642 build_template(…)
20024 2.4660 row_sel_store_mysql_rec
19347 2.3826 btr_cur_search_to_nth_level
16701 2.0568 row_sel_convert_mysql_key_to_innobase

– Most CPU time is spent inside InnoDB (this is great because there have been lots of InnoDB improvements & will continue to be)

libmysql (5.1.56)
samples % symbol name
57390 2.4548 MYSQLparse(void*)
42091 1.8004 String::copy(…)
32543 1.3920 __read_nocancel
30536 1.3062 btr_search_guess_on_hash
24630 1.0535 my_wc_mb_latin1
24407 1.0440 memcpy
23911 1.0228 MYSQLlex(void*, void*)
22392 0.9578 pthread_mutex_lock
21973 0.9399 fcntl
20529 0.8781 my_utf8_uni

– Parsing SQL is slow

libmysql w/ prepared statements (5.1.56)
samples % symbol name
18054 4.1415 String::copy(…)
14223 3.2627 make_join_statistics(…)
11934 2.7376 JOIN::optimize()
10140 2.3261 my_wc_mb_latin1
7152 1.6407 my_utf8_uni
7092 1.6269 Protocol::send_fields(List*, unsigned int)
6530 1.4980 JOIN::prepare(…)
6175 1.4165 Protocol::store_string_aux(…)
5748 1.3186 create_ref_for_key(…)
5325 1.2215 mysql_execute_command(THD*)

– Still lots of time with SQL overhead (not parsing)

HandlerSocket (5.1.56)
samples % symbol name
18946 2.9918 btr_search_guess_on_hash
15853 2.5034 vfprintf
8745 1.3810 row_search_for_mysql
7021 1.1087 buf_page_get_known_nowait
5212 0.8230 __find_specmb
5116 0.8079 dena::dbcontext::cmd_find_internal(…)
5100 0.8054 build_template(…)
4866 0.7684 _IO_default_xsputn
4794 0.7570 dena::hstcpsvr_worker::run_one_ep()
4538 0.7166 send

– Most of the time is in InnoDB & HandlerSocket

I won’t comment too much on this because they’re fairly self-explanatory. I’ve intentionally omitted any timing information because the point of these numbers are to indicate what HandlerSocket avoids doing when compared with standard SQL access. Next steps are to test with 5.5!


Pretty-formatted index fragmentation with xtrabackup

The xtrabackup compiled C binary (as distinct from XtraBackup, which is the combination of the C binary and the Perl script) has support for printing out stats on InnoDB tables and indexes. This can be useful to examine whether you’d benefit from “defragmenting” your MySQL database with OPTIMIZE TABLE, although I have not determined firm guidelines for when that will actually help. I’ve written a small Perl script that formats the stats output nicely to give an overview of fragmentation.

It’s an initial draft, and if you find issues with it I would like to know so I can fix them. The script is embedded in the documentation page and can be downloaded by clicking on the header at the top of the code listing. The output looks like this:

art.link_out104                    832383      38561      86.8%
art.link_out104         PRIMARY    498304         49      91.9%
art.link_out104       domain_id     49600       6230      76.9%
art.link_out104     domain_id_2     26495       3339      89.1%
art.link_out104 from_message_id     28160        142      96.3%
art.link_out104    from_site_id     38848       4874      79.4%
art.link_out104   revert_domain    153984      19276      71.4%
art.link_out104    site_message     36992       4651      83.4%

That output was generated from the stats output that Vadim showed on an earlier blog post about xtrabackup’s analysis capabilities.


MySQL on Amazon RDS part 1: insert performance

Amazon’s Relational Database Service (RDS) is a cloud-hosted MySQL solution. I’ve had some clients hitting performance limitations on standard EC2 servers with EBS volumes (see SSD versus EBS death match), and one of them wanted to evaluate RDS as a replacement. It is built on the same technologies, but the hardware and networking are supposed to be dedicated to RDS, not shared with the general usage of AWS as you get on normal EC2 servers with EBS.

I benchmarked the largest available RDS instance, which is listed as “High-Memory Quadruple Extra Large DB Instance: 68 GB of memory, 26 ECUs (8 virtual cores with 3.25 ECUs each), 64-bit platform, High I/O Capacity.” I used sysbench’s oltp benchmark, with 400,000,000 rows. This creates a table+index data size approximately twice as big as memory, so the workload should be somewhat IO-bound.

My goal for this benchmark is long-term performance, but as a side project, I thought it would be interesting to measure the single-threaded insert throughput as sysbench ran the “prepare” phase and filled the table with 400 million rows. Here is the chart of rows inserted per minute (click on the image for a bigger view):

We can deduce a few things from this.

  1. The overall downward slope of the line is steady enough to show that we did not cross a dramatic memory-to-disk threshold, as famously happens in B-Tree inserts (see InnoDB vs TokuDB for example). This doesn’t mean that we weren’t IO-bound; it might only mean that we were IO-bound the whole time waiting on fsync operations. But we didn’t go from a solely in-memory bottleneck to solely on-disk.
  2. The insert performance is quite variable, more so than I would like to see. My intuition is that there are some severe I/O slowdowns.
  3. I should have gathered more statistics and finer-grained samples, say, every 5 seconds instead of every minute, and some samples of more data such as SHOW INNODB STATUS. But I was on the client’s time and I wasn’t going to spend time redoing it — I did not see that it would really benefit them.
  4. Finally, a single-threaded insert workload is not very revealing. To understand the sustained write performance of an RDS instance, we need a multi-threaded long-term insert benchmark such as IIBench.

In the next post in this series, we will see how the Amazon RDS instance performed at various thread counts on the OLTP benchmark.

Update Vadim and Peter have rightly pointed out that I shouldn’t have published this result without being able to explain exactly what was happening on the server. I will reproduce this test, capture more measurements about what was going on, and post a follow-up before I continue with the actual sysbench benchmark results.


Upcoming Webinar on HandlerSocket

On March 29th, I’ll be giving a webinar whose title is “Understanding HandlerSocket – A NoSQL PlugIn For MySQL”. This is a continuation and extension of the talk I gave during the Percona Live Event in San Francisco back in February. We’ll ask, and answer, the following questions:

  • What is HandlerSocket?
  • Where does HandlerSocket fit in my application stack?
  • Why would I want to use HandlerSocket?
  • How do I use Handlersocket?


To register:

Powered by WordPress | Theme: Aeros 2.0 by