Percona Live Europe Featured Talks: Modern sysbench – Teaching an Old Dog New Tricks with Alexey Kopytov

Percona Live Europe 2017

Percona Live EuropeWelcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post is with Alexey Kopytov, sofware developer and maintainer of sysbench. His talk is Modern sysbench: Teaching an Old Dog New Tricks. His presentation present new features provided by recent releases and explain how they can be used to create complex benchmark scenarios and collect performance metrics with a simple Lua API. It will also run a live demo of some of the new sysbench features.

In our conversation, we discussed benchmarking your database environment:

Percona: How did you get into database technology? What do you love about it?

Alexey: It was 2003, and I was working as a software developer for a boring company providing hosted VoIP solutions. I was a big fan of the free and open source software philosophy, which was way less popular back then than it is today. I contributed to a number of open source projects in my free time, but I also had a dream of developing open source software as part of my paid job. This looked completely unrealistic at the time, until I came across a job posting on a Russian IT forum about a Swedish company called MySQL AB looking for software developers to work remotely on MySQL! That sounded like my dream job, so I applied.

I knew very little about database internals at the time, so looking back I was giving terrible answers during my job interviews. Nevertheless, I joined the High Performance Group at MySQL AB after a few months, and that has defined my professional life for many years.

I love database technology because it presents the toughest challenges in software development. Most problems and solutions related to ever-evolving hardware, scalability and data processing requirements are discovered first by people from the database world.

Percona: Your talk is called “Modern sysbench: Teaching an Old Dog New Tricks”. What is sysbench used for generally, why is it important and how have you used it in your career? 

Alexey: sysbench was an internal project that I took over as soon as I joined MySQL AB. We used it to troubleshoot customer issues, find performance bottlenecks in MySQL and evaluate new features. Of course it was an open source project, so over the years we’ve got many people from the MySQL community using sysbench for all kinds of performance research like testing new hardware, identifying performance-related issues and comparing MySQL configurations, versions and forks.

Percona: What are some of the important new developments in the latest release?

Alexey: This year sysbench got a major upgrade in terms of features and performance to meet the modern world of many-core CPUs, powerful storage devices and distributed database systems capable of processing millions of transactions per second. Some feature highlights from the latest release include simplified command-line interface, a revamped API which allows creating more complex benchmark scenarios with less code, new performance metrics, customizable reports and more!

Percona: What do you want attendees to take away from your session? Why should they attend?

Alexey: sysbench is quite popular, but most people rarely use it more than a few bundled OLTP-style benchmarks. I’d like to explain its full potential, especially the possibilities provided by the new features. I want people to use it to create their own benchmarks, not necessarily related to MySQL, and hopefully find sysbench useful in areas that I have not even envisioned myself.

Percona: What are you most looking forward to at Percona Live Europe 2017?

Alexey: For me Percona Live conferences have always been the place where I can feel the pulse of the technology and learn from the smartest people in the industry. This is especially true now that Percona Live provides talks on diverse topics from communities and database management technologies other than MySQL. Which makes it an even greater event to share ideas, solutions and expertise.

Want to find out more about Alexey, sysbench and database benchmarking? Register for Percona Live Europe 2017, and see his talk Modern sysbench: Teaching an Old Dog New Tricks. Register now to get the best price! Use discount code SeeMeSpeakPLE17 to get 10% off your registration.

Percona Live Open Source Database Conference Europe 2017 in Dublin is the premier European open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, MariaDB, MongoDB, time series database, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference Europe will be September 25-27, 2017 at the Radisson Blu Royal Hotel, Dublin.


sysbench Histograms: A Helpful Feature Often Overlooked

Sysbench Histograms

Sysbench HistogramsIn this blog post, I will demonstrate how to run and use sysbench histograms.

One of the features of sysbench that I often I see overlooked (and rarely used) is its ability to produce detailed query response time histograms in addition to computing percentile numbers. Looking at histograms together with throughput or latency over time provides many additional insights into query performance.

Here is how you get detailed sysbench histograms and performance over time:

sysbench --rand-type=uniform --report-interval=1 --percentile=99 --time=300 --histogram --mysql-password=sbtest oltp_point_select --table_size=400000000 run

There are a few command line options to consider:

  • report-interval=1 prints out the current performance measurements every second, which helps see if performance is uniform, if you have stalls or otherwise high variance
  • percentile=99 computes 99 percentile response time, rather than 95 percentile (the default); I like looking at 99 percentile stats as it is a better measure of performance
  • histogram=on produces a histogram at the end of the run (as shown below)

The first thing to note about this histogram is that it is exponential. This means the width of the buckets changes with higher values. It starts with 0.001 ms (one microsecond) and gradually grows. This design is used so that sysbench can deal with workloads with requests that take small fractions of milliseconds, as well as accommodate requests that take many seconds (or minutes).

Next, we learn some us very interesting things about typical request response time distribution for databases. You might think that this distribution would be close to some to some “academic” distributions, such as normal distribution. In reality, we often observe is something of a “camelback” distribution (not a real term) – and our “camel” can have more than two humps (especially for simple requests such as the single primary key lookup shown here).

Why do request response times tend to have this distribution? It is because requests can take multiple paths inside the database. For example, certain requests might get responses from the MySQL Query Cache (which will result in the first hump). A second hump might come from resolving lookups using the InnoDB Adaptive Hash Index. A third hump might come from finding all the data in memory (rather than the Adaptive Hash Index). Finally, another hump might coalesce around the time (or times) it takes to execute on requests that require disk IO.    

You also will likely see some long-tail data that highlights the fact that MySQL and Linux are not hard, real-time systems. As an example, this very simple run with a single thread (and thus no contention) has an outlier at around 18ms. Most of the requests are served within 0.2ms or less.

As you add contention, row-level locking, group commit and other issues, you are likely to see even more complicated diagrams – which can often show you something unexpected:

Latency histogram (values are in milliseconds)
      value  ------------- distribution ------------- count
      0.050 |                                         1
      0.051 |                                         2
      0.052 |                                         2
      0.053 |                                         54
      0.053 |                                         79
      0.054 |                                         164
      0.055 |                                         883
      0.056 |*                                        1963
      0.057 |*                                        2691
      0.059 |**                                       4047
      0.060 |****                                     9480
      0.061 |******                                   15234
      0.062 |********                                 20723
      0.063 |********                                 20708
      0.064 |**********                               26770
      0.065 |*************                            35928
      0.066 |*************                            34520
      0.068 |************                             32247
      0.069 |************                             31693
      0.070 |***************                          41682
      0.071 |**************                           37862
      0.073 |********                                 22691
      0.074 |******                                   15907
      0.075 |****                                     10509
      0.077 |***                                      7853
      0.078 |****                                     9880
      0.079 |****                                     10853
      0.081 |***                                      9243
      0.082 |***                                      9280
      0.084 |***                                      8947
      0.085 |***                                      7869
      0.087 |***                                      8129
      0.089 |***                                      9073
      0.090 |***                                      8364
      0.092 |***                                      6781
      0.093 |**                                       4672
      0.095 |*                                        3356
      0.097 |*                                        2512
      0.099 |*                                        2177
      0.100 |*                                        1784
      0.102 |*                                        1398
      0.104 |                                         1082
      0.106 |                                         810
      0.108 |                                         742
      0.110 |                                         511
      0.112 |                                         422
      0.114 |                                         330
      0.116 |                                         259
      0.118 |                                         203
      0.120 |                                         165
      0.122 |                                         126
      0.125 |                                         108
      0.127 |                                         87
      0.129 |                                         83
      0.132 |                                         55
      0.134 |                                         42
      0.136 |                                         45
      0.139 |                                         41
      0.141 |                                         149
      0.144 |                                         456
      0.147 |                                         848
      0.149 |*                                        2128
      0.152 |**                                       4586
      0.155 |***                                      7592
      0.158 |*****                                    13685
      0.160 |*********                                24958
      0.163 |*****************                        44558
      0.166 |*****************************            78332
      0.169 |*************************************    98616
      0.172 |**************************************** 107664
      0.176 |**************************************** 107154
      0.179 |****************************             75272
      0.182 |******************                       49645
      0.185 |****************                         42793
      0.189 |*****************                        44649
      0.192 |****************                         44329
      0.196 |******************                       48460
      0.199 |*****************                        44769
      0.203 |**********************                   58578
      0.206 |***********************                  61373
      0.210 |**********************                   58758
      0.214 |******************                       48012
      0.218 |*************                            34533
      0.222 |**************                           36517
      0.226 |*************                            34645
      0.230 |***********                              28694
      0.234 |*******                                  17560
      0.238 |*****                                    12920
      0.243 |****                                     10911
      0.247 |***                                      9208
      0.252 |****                                     10556
      0.256 |***                                      7561
      0.261 |**                                       5047
      0.266 |*                                        3757
      0.270 |*                                        3584
      0.275 |*                                        2951
      0.280 |*                                        2078
      0.285 |*                                        2161
      0.291 |*                                        1747
      0.296 |*                                        1954
      0.301 |*                                        2878
      0.307 |*                                        2810
      0.312 |*                                        1967
      0.318 |*                                        1619
      0.324 |*                                        1409
      0.330 |                                         1205
      0.336 |                                         1193
      0.342 |                                         1151
      0.348 |                                         989
      0.354 |                                         985
      0.361 |                                         799
      0.367 |                                         671
      0.374 |                                         566
      0.381 |                                         537
      0.388 |                                         351
      0.395 |                                         276
      0.402 |                                         214
      0.409 |                                         143
      0.417 |                                         80
      0.424 |                                         85
      0.432 |                                         54
      0.440 |                                         41
      0.448 |                                         29
      0.456 |                                         16
      0.464 |                                         15
      0.473 |                                         11
      0.481 |                                         4
      0.490 |                                         9
      0.499 |                                         4
      0.508 |                                         3
      0.517 |                                         4
      0.527 |                                         4
      0.536 |                                         2
      0.546 |                                         4
      0.556 |                                         4
      0.566 |                                         4
      0.587 |                                         1
      0.597 |                                         1
      0.608 |                                         5
      0.619 |                                         3
      0.630 |                                         2
      0.654 |                                         2
      0.665 |                                         5
      0.677 |                                         26
      0.690 |                                         298
      0.702 |                                         924
      0.715 |*                                        1493
      0.728 |                                         1027
      0.741 |                                         1112
      0.755 |                                         1127
      0.768 |                                         796
      0.782 |                                         574
      0.797 |                                         445
      0.811 |                                         415
      0.826 |                                         296
      0.841 |                                         245
      0.856 |                                         202
      0.872 |                                         210
      0.888 |                                         168
      0.904 |                                         217
      0.920 |                                         163
      0.937 |                                         157
      0.954 |                                         204
      0.971 |                                         155
      0.989 |                                         158
      1.007 |                                         137
      1.025 |                                         94
      1.044 |                                         79
      1.063 |                                         52
      1.082 |                                         36
      1.102 |                                         25
      1.122 |                                         25
      1.142 |                                         16
      1.163 |                                         8
      1.184 |                                         5
      1.205 |                                         7
      1.227 |                                         2
      1.250 |                                         4
      1.272 |                                         3
      1.295 |                                         3
      1.319 |                                         2
      1.343 |                                         2
      1.367 |                                         1
      1.417 |                                         2
      1.791 |                                         1
      1.996 |                                         2
      2.106 |                                         2
      2.184 |                                         1
      2.264 |                                         1
      2.347 |                                         2
      2.389 |                                         1
      2.433 |                                         1
      2.477 |                                         1
      2.568 |                                         2
      2.615 |                                         1
      2.710 |                                         1
      2.810 |                                         1
      2.861 |                                         1
      3.187 |                                         1
      3.488 |                                         1
      3.816 |                                         1
      4.028 |                                         1
      6.913 |                                         1
      7.565 |                                         1
      8.130 |                                         1
     17.954 |                                         1

I hope you give sysbench histograms a try, and see what you can discover!


Performance Schema Benchmarks: OLTP RW

Performance Schema Benchmarks

In this blog post, we’ll look at Performance Schema benchmarks for OLTP Read/Write workloads.

I am in love with Performance Schema and talk a lot about it. Performance Schema is a revolutionary MySQL troubleshooting instrument, but earlier versions had performance issues. Many of these issues are fixed now, and the default options work quickly and reliably. However, there is no free lunch! It is expected that the more instruments you turn ON, the more overhead you’ll have.

The advice I give my customers is that when in doubt, only turn ON the instruments that are required to troubleshoot your issue. Many of them ask: what exactly are the overhead costs for one instrumentation or another? I believe the best answer is “test on your system!” No generic benchmark can exactly repeat a workload on your site. But while I was working on the “OpenSource Databases on Big Machines” project, I decided to test the performance of Performance Schema as well.

I only tested a Read/Write workload. I used the same fast machine (144 CPU cores), the same MySQL options and the same SysBench commands that I described in this post. The option 


 was changed to O_DIRECT, because it’s more reasonable for real-life workloads. I also upgraded the MySQL Server version to Oracle’s MySQL 5.7.17. The reason for the upgrade was to test if the issue described in this post is repeatable with latest Oracle MySQL server version. But since I tested Performance Schema, the effect on Percona Server for MySQL should be same.

I tested nine different scenarios:

  1. “All disabled”: Performance Schema is ON, but all instruments and consumers are disabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
  2. “All enabled”: Performance Schema is ON, and all instruments and consumers are enabled.
    update setup_instruments set enabled='yes';
    update setup_consumers set enabled='yes';
  3. “Default”: Performance Schema is ON, and only default instruments and consumers are enabled.
  4. “MDL only”: only Metadata Lock instrumentation is enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name='wait/lock/metadata/sql/mdl';
  5. “Memory only”: only Memory instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'memory%';
  6. “Stages and Statements”: only Stages and Statements instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'statement%';
    update setup_consumers set enabled='yes' where name like 'events_statements%';
    update setup_instruments set enabled='yes' where name like 'stage%';
    update setup_consumers set enabled='yes' where name like 'events_stages%';
  7. “Stages only”: only Stages instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'stage%';
    update setup_consumers set enabled='yes' where name like 'events_stages%';
  8. “Statements only”: only Statements instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'statement%';
    update setup_consumers set enabled='yes' where name like 'events_statements%';
  9. “Waits only”: only Waits instrumentation enabled.
    update setup_consumers set enabled='no';
    update setup_instruments set enabled='no';
    update setup_consumers set enabled='yes' where name= 'global_instrumentation';
    update setup_consumers set enabled='yes' where name= 'thread_instrumentation';
    update setup_instruments set enabled='yes' where name like 'wait%' ;
    update setup_consumers set enabled='yes' where name like 'events_waits%';

Here are the overall results.

As you can see, some instrumentation only slightly affects performance, while others affect it a lot. I created separate graphs to make the picture clearer.

As expected, enabling all instrumentation makes performance lower:

Does this mean to use Performance Schema, you need to start the server with it ON and then disable all instruments? No! The default options have very little effect on performance:

The same is true for Metadata Locks, Memory and Statements instrumentation:

Regarding statements, I should note that I used prepared statements (which are instrumented in version 5.7). But it makes sense to repeat the tests without prepared statements.

The Stages instrumentation starts affecting performance:

However, the slowdown is reasonable and it happens only after we reach 32 concurrent threads. It still provides great insights on what is happening during query execution.

The real performance killer is Waits instrumentation:

It affects performance close to the same way as all instruments ON.


Using Performance Schema with the default options, Memory, Metadata Locks and Statements instrumentation doesn’t have a great impact on read-write workload performance. You might notice slowdowns with Stages instrumentation after reaching 32 actively running parallel connections. The real performance killer is Waits instrumentation. And even with it on, you will start to notice a performance drop only after 10,000 transactions per second.









Millions of Queries per Second: PostgreSQL and MySQL’s Peaceful Battle at Today’s Demanding Workloads

PostgreSQL and MySQL

This blog compares how PostgreSQL and MySQL handle millions of queries per second.

Anastasia: Can open source databases cope with millions of queries per second? Many open source advocates would answer “yes.” However, assertions aren’t enough for well-grounded proof. That’s why in this blog post, we share the benchmark testing results from Alexander Korotkov (CEO of Development, Postgres Professional) and Sveta Smirnova (Principal Technical Services Engineer, Percona). The comparative research of PostgreSQL 9.6 and MySQL 5.7 performance will be especially valuable for environments with multiple databases.

The idea behind this research is to provide an honest comparison for the two popular RDBMSs. Sveta and Aleksander wanted to test the most recent versions of both MySQL and PostgreSQL with the same tool, under the same challenging workloads and using the same configuration parameters (where possible). However, because both PostgreSQL and MySQL ecosystems evolved independently, with standard testing tools (pgbench and SysBench) used for each database, it wasn’t an easy journey.

The task fell to database experts with years of hands-on experience. Sveta has worked as Senior Principal Technical Support Engineer in the Bugs Verification Group of the MySQL Support Group at Oracle for more than eight years, and since 2015  has worked as a Principal Technical Services Engineer at Percona. Alexander Korotkov is a PostgreSQL major contributor, and the developer of a number PostgreSQL features – including the CREATE ACCESS METHOD command, generic WAL interface, lockfree Pin/UnpinBuffer, index-based search for regular expressions and much more. So we have a pretty decent cast for this particular play!

SvetaDimitri Kravtchuk regularly publishes detailed benchmarks for MySQL, so my main task wasn’t confirming that MySQL can do millions of queries per second. As our graphs will show, we’ve passed that mark already. As a Support Engineer, I often work with customers who have heterogeneous database environments in their shops, and want to know about the impact of migrating jobs from one database to another. So instead, I found the chance to work with the Postgres Professional company and identify both the strong and weak points of the two databases an excellent opportunity.

We wanted to test both databases on the same hardware, using the same tools and tests. We expected to test base functionality, and then work on more detailed comparisons. That way we could compare different real-world use case scenarios and popular options.

Spoiler: We are far from the final results. This is the start of a blog series.

OpenSource Databases on Big Machines, Series 1: “That Was Close…”

PostgreSQL Professional together with Freematiq provided two modern, powerful machines for tests.

Hardware configuration:

Processors: physical = 4, cores = 72, virtual = 144, hyperthreading = yes
Memory: 3.0T
Disk speed: about 3K IOPS
OS: CentOS 7.1.1503
File system: XFS

I also used a smaller Percona machine.

Hardware configuration:

Processors: physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Memory: 251.9G
Disk speed: about 33K IOPS
OS: Ubuntu 14.04.5 LTS
File system: EXT4

Note that machines with smaller numbers of CPU cores and faster disks are more common for MySQL installations than machines with larger numbers of cores.

The first thing we needed to agree on is which tool to use. A fair comparison only makes sense if the workloads are as close as possible.

The standard PostgreSQL tool for performance tests is pgbench, while for MySQL it’s SysBench. SysBench supports multiple database drivers and scriptable tests in the Lua programming language, so we decided to use this tool for both databases.

The initial plan was to convert pgbench tests into SysBench Lua syntax, and then run standard tests on both databases. After initial results, we modified our tests to better examine specific MySQL and PostgreSQL features.

I converted pgbench tests into SysBench syntax, and put the tests into an open-database-bench GitHub repository.

And then we both faced difficulties.

As I wrote already, I also ran the tests on a Percona machine. For this converted test, the results were almost identical:

Percona machine:

OLTP test statistics:
       transactions:                        1000000 (28727.81 per sec.)
       read/write requests:                 5000000 (143639.05 per sec.)
       other operations:                    2000000 (57455.62 per sec.)

Freematiq machine:

OLTP test statistics:
       transactions:                        1000000 (29784.74 per sec.)
       read/write requests:                 5000000 (148923.71 per sec.)
       other operations:                    2000000 (59569.49 per sec.)

I started investigating. The only place where the Percona machine was better than Freematiq’s was disk speed. So I started running the pgbench read-only test, which was identical to SysBench’s point select test with full dataset in memory. But this time SysBench used 50% of available CPU resources:

4585 smirnova  20   0  0,157t 0,041t   9596 S  7226  1,4  12:27.16 mysqld
8745 smirnova  20   0 1266212 629148   1824 S  7126  0,0   9:22.78 sysbench

Alexander, in turn, had issues with SysBench, which could not create a high load on PostgreSQL when prepared statements were used:

93087 korotkov  20   0 9289440 3,718g   2964 S 242,6  0,1   0:32.82 sysbench
93161 korotkov  20   0 32,904g  81612  80208 S   4,0  0,0   0:00.47 postgres
93116 korotkov  20   0 32,904g  80828  79424 S   3,6  0,0   0:00.46 postgres
93118 korotkov  20   0 32,904g  80424  79020 S   3,6  0,0   0:00.47 postgres
93121 korotkov  20   0 32,904g  80720  79312 S   3,6  0,0   0:00.47 postgres
93128 korotkov  20   0 32,904g  77936  76536 S   3,6  0,0   0:00.46 postgres
93130 korotkov  20   0 32,904g  81604  80204 S   3,6  0,0   0:00.47 postgres
93146 korotkov  20   0 32,904g  81112  79704 S   3,6  0,0   0:00.46 postgres

We contacted SysBench author Alexey Kopytov, and he fixed MySQL issue. The solution is:

  • Use SysBench with options
    --percentile=0 --max-requests=0

      (reasonable CPU usage)

  • Use concurrency_kit branch (better concurrency and Lua processing)
  • Rewrite Lua scripts to support prepared statements (pull request:
  • Start both SysBench and mysqld with the jemalloc or tmalloc library pre-loaded

A fix for PostgreSQL is on the way. For now, Alexander converted a standard SysBench test into pgbench format and we stuck with it. Not much new for MySQL, but at least we had a baseline for comparison.

The next difficulty I faced was the default operating system parameters. To make the long story short, I changed them to the recommended ones (described below):

cpupower frequency-set --governor performance
kernel.sched_migration_cost_ns= 5000000
IO scheduler [deadline]

The same parameters were better for PostgreSQL performance as well. Alexander set his machine similarly.

After solving these issues we learned and implemented the following:

  • We cannot use a single tool (for now)
  • Alexander wrote a test for pgbench, imitating the standard SysBench tests
  • We are still not able to write custom tests because we use different tools

But we could use these tests as a baseline. After work done by Alexander, we stuck with the standard SysBench tests. I converted them to use prepared statements, and Alexander converted them into pgbench format.

I should mention that I was not able to get the same results for the Read Only and Point Select tests as Dimitri. They are close, but slightly slower. We need to investigate if this is the result of different hardware, or my lack of performance testing abilities. The results from the Read-Write tests are similar.

Another difference was between the PostgreSQL and MySQL tests. MySQL users normally have many connections. Setting the value of the variable


, and limiting the total number of parallel connections to thousands is not rare nowadays. While not recommended, people use this option even without the thread pool plugin. In real life, most of these connections are sleeping. But there is always a chance they all will used in cases of increased website activity.

For MySQL I tested up to 1024 connections. I used powers of two and multiplies of the number of cores: 1, 2, 4, 8, 16, 32, 36, 64, 72, 128, 144, 256, 512 and 1024 threads.

For Alexander, it was more important to test in smaller steps. He started from one thread and increased by 10 threads, until 250 parallel threads were reached. So you will see a more detailed graph for PostgreSQL, but no results after 250 threads.

Here are our comparison results.


PostgreSQL and MySQL

  • pgsql-9.6 is standard PostgreSQL
  • pgsql-9.6 + pgxact-align is PostgreSQL with this patch (more details can be found in this blog post)
  • MySQL-5.7 Dimitri is Oracle’s MySQL Server
  • MySQL-5.7 Sveta is Percona Server 5.7.15


PostgreSQL and MySQL


PostgreSQL and MySQL

Sync commit in PostgreSQL is a feature, similar to


 in InnoDB, and async commit is similar to



You see that the results are very similar: both databases are developing very fast and work with modern hardware well.

MySQL results which show 1024 threads for reference.


PostgreSQL and MySQL

OLTP RW with innodb_flush_log_at_trx_commit set to 1 and 2

PostgreSQL and MySQL

After receiving these results, we did a few feature-specific tests that will be covered in separate blog posts.

More Information

MySQL Options for OLTP RO and Point SELECT tests:

# general
table_open_cache = 8000
# files
# Monitoring
innodb_monitor_enable = '%'
performance_schema=OFF #cpu-bound, matters for performance
#Percona Server specific
# buffers
innodb_buffer_pool_instances=128 #to avoid wait on InnoDB Buffer Pool mutex
# InnoDB-specific
innodb_checksums=1 #Default is CRC32 in 5.7, very fast
innodb_doublewrite= 1 #
innodb_stats_persistent = 1
innodb_support_xa=0 #(We are read-only, but this option is deprecated)
innodb_spin_wait_delay=6 #(Processor and OS-dependent)
# perf special
innodb_adaptive_flushing = 1
innodb_flush_neighbors = 0
innodb_read_io_threads = 4
innodb_write_io_threads = 4
innodb_adaptive_hash_index=0 (depends on workload, always check)

MySQL Options for OLTP RW:

#Open files
table_open_cache = 8000
table_open_cache_instances = 16
query_cache_type = 0
#Percona Server specific
#InnoDB General
innodb_spin_wait_delay=12 Good value for RO is 6, for RW and RC is 192
innodb_buffer_pool_instances=128 #to avoid wait on InnoDB Buffer Pool mutex
innodb_flush_neighbors = 0
innodb_change_buffering=none #can be inserts, workload-specific
optimizer_switch="index_condition_pushdown=off" #workload-specific

MySQL SysBench parameters.

 [ --test=/data/sveta/sysbench/sysbench/tests/db/oltp_prepared.lua | --test=/data/sveta/sysbench/sysbench/tests/db/oltp_simple_prepared.lua ]
 --db-driver=mysql --oltp-tables-count=8 --oltp-table-size=10000000
--mysql-table-engine=innodb --mysql-user=msandbox --mysql-password=msandbox
--num-threads=$i --max-requests=0 --max-time=300
--percentile=0 [--oltp-read-only=on --oltp-skip-trx=on]

PostgreSQL pgbench parameters:

$ git clone
$ cd pg_oltp_bench
$ make USE_PGXS=1
$ sudo make USE_PGXS=1 install
$ psql DB -f oltp_init.sql
$ psql DB -c "CREATE EXTENSION pg_oltp_bench;"
$ pgbench -c 100 -j 100 -M prepared -f oltp_ro.sql -T 300 -P 1 DB
$ pgbench -c 100 -j 100 -M prepared -f oltp_rw.sql -T 300 -P 1 DB

Features in MySQL 5.7 that significantly improved performance:

Anastasia: The initial findings of this research were announced at Percona Live Amsterdam 2016. More findings were added to the second version of the same talk given at Moscow HighLoad++ 2016. Hopefully the third iteration of this talk will be available at Percona Live Open Source Database Conference 2017 in Santa Clara. Stay tuned: the Percona Live Committee is working on the program!




















Benchmark MongoDB with sysbench

Benchmark MongoDB with sysbench

Benchmark MongoDB with sysbenchIn this blog post, we’ll discuss how to benchmark MongoDB with sysbench.

In an earlier post, I mentioned our use of sysbench-mongodb (via this fork) to run benchmarks of MongoDB servers. I now want to share our work extending sysbench to make it work with MongoDB.

If you’re not familiar with sysbench, it’s a great project developed by Alexey Kopytov that lets you run different types of benchmarks (referred to as “tests” by the tool), including database benchmarks. The database tests are implemented in Lua scripts, which means you can customize them as needed (or even write new ones from scratch) – something useful for simulating specific workloads.

All of the database tests in sysbench assume an SQL-based database, so instead of trying to shoehorn MongoDB tests into this framework I modified the connect/disconnect functions to handle MongoDB, and then implemented new functions specific for this database.

You can find the work (which is still in progress but usable, and in fact currently used by us in benchmarks) on the dev-mongodb-support-1.0 branch of our sysbench fork.

To use it, you just need to specify the –mongo-url argument (others too, as needed, but this is the one that must be present for sysbench to detect a MongoDB test is requested), and then provide the path to the Lua script you want to run. The following is an example:

--oltp-inserts=1 run

To build this branch, you’ll first need to build and install (or otherwise obtain) the mongo-c-driver project, as that is what we use to connect to MongoDB. Once that’s done, building is just a matter of running the following commands from the repo’s root:

sudo make install #optionally

The changes should not affect the other database tests in sysbench, though I have only verified that the MySQL ones continue to work.

Right now, the workload from sysbench-mongodb is implemented in Lua scripts (oltp.lua), and work is in progress to allow freeform operations to be created with new Lua scripts (by providing functions that take JSON as the argument). As an alternative, you may want to check out this much-less-tested (and currently unstable) branch based on luamongo. It already supports the creation of arbitrary workloads in Lua. In this case, you also need to build luamongo, which is included.

With either branch, you can add new tests by implementing new Lua scripts (though the dev-mongodb-support-1.0 branch still needs a few functions implemented on the C side to support arbitrary operations from the Lua side).

We think there are still some types of operations needed to improve sysbench’s usefulness for MongoDB, such as queries involving arrays, union, the $in operator, geospatial operators, and in place updates.

We hope you find this useful, and we welcome suggestions and bug reports to improve it.

Happy benchmarking!


The MySQL query cache: Worst enemy or best friend?

During the last couple of months I have been involved in an unusually high amount of performance audits for e-commerce applications running with Magento. And although the systems were quite different, they also had one thing in common: the MySQL query cache was very useful. That was counter-intuitive for me as I’ve always expected the query cache to be such a bottleneck that response time is better when the query cache is turned off no matter what. That lead me to run a few experiments to better understand when the query cache can be helpful.

Some context

The query cache is well known for its contentions: a global mutex has to be acquired for any read or write operation, which means that any access is serialized. This was not an issue 15 years ago, but with today’s multi-core servers, such serialization is the best way to kill performance.

However from a performance point of view, any query cache hit is served in a few tens of microseconds while the fastest access with InnoDB (primary lookup) still requires several hundreds of microseconds. Yes, the query cache is at least an order of magnitude faster than any query that goes to InnoDB.

A simple test

To better understand how good or bad the query cache can be, I set up a very simple benchmark:

  • 1M records were inserted in 16 tables.
  • A moderate write load (65 updates/s) was run with a modified version of the update_index.lua sysbench script (see the end of the post for the code).
  • The select.lua sysbench script was run, with several values for the --num-threads option.

Note that the test is designed to be unfavorable to the query cache as the whole dataset fits in the buffer pool and the SELECT statements are very simple. Also note that I configured the query cache to be large enough so that no entry was evicted from the cache due to low memory.

Results – MySQL query cache ON

First here are the results when the query cache is enabled:


This configuration scales well up to 4 concurrent threads, but then the throughput degrades very quickly. With 10 concurrent threads, SHOW PROCESSLIST is enough to show you that all threads spend all their time waiting for the query cache mutex. Okay, this is not a surprise.

Results – MySQL query cache OFF

When the query cache is disabled, this is another story:


Throughput scales well up to somewhere between 10 and 20 threads (for the record the server I was using had 16 cores). But more importantly, even at the higher concurrencies, the overall throughput continued to increase: at 20 concurrent threads, MySQL was able to serve nearly 3x more queries without the query cache.


With Magento, you can expect to have a light write workload, very low concurrency and also quite complex SELECT statements. Given the results of our simple benchmarks, it is finally not that surprising that the MySQL query cache is a good fit in this case.

It is also worth noting that many applications run a database workload where writes are light and concurrency is low: the query cache should then not be discarded immediately. And maybe it is time for Oracle to make plans to improve the query cache as suggested by Peter a few years ago?

Annex: sysbench commands

# Modified update_index.lua
function event(thread_id)
   local table_name
   table_name = "sbtest".. sb_rand_uniform(1, oltp_tables_count)
   rs = db_query("UPDATE ".. table_name .." SET k=k+1 WHERE id=" .. sb_rand(1, oltp_table_size))
   db_query("SELECT SLEEP(0.015)")

# Populate the tables
sysbench --mysql-socket=/data/mysql/mysql.sock --mysql-user=root --mysql-db=db1 --oltp-table-size=1000000 --oltp-tables-count=16 --num-threads=16 --test=/usr/share/doc/sysbench/tests/db/insert.lua prepare
# Write workload
sysbench --mysql-socket=/data/mysql/mysql.sock --mysql-user=root --mysql-db=db1 --oltp-tables-count=16 --num-threads=1 --test=/usr/share/doc/sysbench/tests/db/update_index.lua --max-requests=1000000 run
# Read workload
sysbench --mysql-socket=/data/mysql/mysql.sock --mysql-user=root --mysql-db=db1 --oltp-tables-count=16 --num-threads=1 --test=/usr/share/doc/sysbench/tests/db/select.lua --max-requests=10000000 run

The post The MySQL query cache: Worst enemy or best friend? appeared first on MySQL Performance Blog.


Q&A: High availability when using MySQL in the cloud

Percona MySQL webinar followup: Q&ALast week I hosted a webinar on using MySQL in the cloud for High Availability (HA) alongside 451 Research analyst Jason Stamper. You can watch the recording and also download the slides (free) here. Just click the “Register” button at the end of that page.

We had several excellent questions and we didn’t have time to get to several of them in the allotted time. I’m posting them here along with the answers. Feel free to ask follow-up questions in the comments below.

Q: Can the TokuDB engine be used in a PXC environment?

A: No, TokuDB cannot currently be used in a PXC environment, the only supported engine in Percona XtraDB Cluster 5.6 is InnoDB.

Q: With Galera replication (PXC), is balancing the load on each node?

A: No, you need to implement your own load balancing and HA layer between your clients and the Percona XtraDB Cluster server.  Examples mentioned in the webinar include HAProxy and F5 BigIP.

Q: What’s the best version of Percona XtraDB Cluster regarding InnoDB performance?

A: In general for best performance you should be using the latest release of Percona XtraDB Cluster 5.6, which is currently 5.6.24, released on June 3rd, 2015.

Q: Can I redirect my writes in Percona XtraDB Cluster to multiple nodes using the HAProxy? While trying with SysBench I can see write-only goes to first nodes in PXC while reads does goes to multiple nodes.

A: Yes you can configure HAProxy to distribute both reads and writes across all of your nodes in a Percona XtraDB Cluster environment. Perhaps SysBench created only one database connection for all writes, and so haproxy kept those confined to only one host. You may want to experiment with parallel_prepare.lua.

Q: What’s the optimal HA for small datasets (db is less than 10gb)?

A: The optimal HA deployment for small datasets would be dependent on your level of recovery required (tolerance for loss of transactions) and time that you can be in an unavailable state (seconds, minutes, hours?).  Unfortunately there isn’t a single answer to your question, however, if you are interested in further discussion on this point Percona would be happy to coordinate a time to speak.  Please feel free to contact me directly and we can continue the conversation at

 Q: Is there a concept of local master vs. remote master with PXC?

A: No there is no concept of local vs remote master.  All nodes in a Percona XtraDB Cluster can now be classified as Master, regardless of their proximity to the clients.

Q: Are there any concerns when considering AWS RDS or AURORA DB for MySQL HA in the Cloud?

A: Regarding AWS RDS, yes this a good option for MySQL HA in the Cloud.  I unfortunately haven’t worked with Aurora DB that much yet so I don’t have an opinion on it’s suitability for HA in the Cloud.

Q: We tried out PXC awhile back and it used to lock everything whenever any ddl was done. Has that changed?

A: We would have to look at the specifics of your environment, however, there have been numerous improvements in the 1½ years of development since Percona XtraDB Cluster went Generally Available (GA) on January 30th, 2014 in version 5.6.15.

Q: Is using the arbitrator a must?

A: No the arbitrator role via the garbd daemon is generally only used when operating in a minimal environment of two nodes that contain the data and you need a third node for quorum but don’t want to store the data a third time.

Q: Can we do a cluster across different zones?

A: Yes you can. However be aware that the latency incurred for all cluster certification operations will be impacted by the round trip time between nodes.

Q: Does PXC also support the MyISAM database?

A: No, Percona XtraDB Cluster does not support any storage engine other than InnoDB as of PXC 5.6.

Q: How do load balancers affect the throughput in a Galera-based setup given that the write would be limited by the slowest node?

A: Load balancers will introduce some measure of additional latency in the form of CPU time in the load balancer layer as it evaluates its own ruleset, and also in network time due to additional hop via load balancer.  Otherwise there should be no perceptible difference in the write throughput of a Percona XtraDB Cluster with and without a load balancer as it relates to the “slowest node” factor.

Q: Have you used MaxScale yet? If so, what are your thoughts?

A: Unfortunately I haven’t used MaxScale however Yves Trudeau, Percona Principal Architect, has recently written about MaxScale in this blog post.

Q: How do you configure timeout and maintain persistent connection to HAProxy?

A: I would encourage you to refer to the HAProxy Documentation.

The post Q&A: High availability when using MySQL in the cloud appeared first on MySQL Performance Blog.


Update on the InnoDB double-write buffer and EXT4 transactions

In a post, written a few months ago, I found that using EXT4 transactions with the “data=journal” mount option, improves the write performance significantly, by 55%, without putting data at risk. Many people commented on the post mentioning they were not able to reproduce the results and thus, I decided to further investigate in order to find out why my results were different.

So, I ran sysbench benchmarks on a few servers and found when the InnoDB double-write buffer limitations occur and when they don’t. I also made sure some of my colleagues were able to reproduce the results. Basically, in order to reproduce the results you need the following conditions:

  • Spinning disk (no SSD)
  • Enough CPU power
  • A dataset that fits in the InnoDB buffer pool
  • A continuous high write load with many ops waiting for disk

Using the InnoDB double write buffer on an SSD disk somewhat prevents us from seeing the issue, something good performance wise. That comes from the fact that the latency of each write operation is much lower. That makes sense, the double-writer buffer is an area of 128 pages on disk that is used by the write threads. When a write thread needs to write a bunch of dirty pages to disk, it first writes them sequentially to free slots in the double write buffer in a single iop and then, it spends time writing the pages to their actual locations on disk using typically one iop per page. Once done writing, it releases the double-write buffer slots it was holding and another thread can do its work. The presence of a raid controller with a write cache certainly helps, at least until the write cache is full. Thus, since I didn’t tested with a raid controller, I suspect a raid controller write cache will delay the apparition of the symptoms but if the write load is sustained over a long period of time, the issue with the InnoDB double write buffer will appear.

So, to recapitulate, on a spinning disk, a write thread needs to hold a lock on some of the double-write buffer slots for at least a few milliseconds per page it needs to write while on a SSD disk, the slots are released very quickly because of the low latency of the SSD storage. To actually stress the InnoDB double-write buffer on a SSD disk, one must push much more writes.

That leads us to the second point, the amount of CPU resources available. At first, one of my colleague tried to reproduce the results on a small EC2 instance and failed. It appeared that by default, the sysbench oltp.lua script is doing quite a lot of reads and those reads saturate the CPU, throttling the writes. By lowering the amount of reads in the script, he was then able to reproduce the results.

For my benchmarks, I used the following command:

sysbench --num-threads=16 --mysql-socket=/var/lib/mysql/mysql.sock
--mysql-database=sbtest --mysql-user=root
--test=/usr/share/doc/sysbench/tests/db/oltp.lua --oltp-table-size=50000000
--oltp-test-mode=complex --mysql-engine=innodb --db-driver=mysql
--report-interval=60 --max-requests=0 --max-time=3600 run

Both servers used were metal boxes with 12 physical cores (24 HT). With less CPU resources, I suggest adding the following parameters:


So that the CPU is not wasted on reads and enough writes are generated. Remember we are not doing a generic benchmarks, we are just stressing the InnoDB double-write buffer.

In order to make sure something else isn’t involved, I verified the following:

  • Server independence, tried on 2 physical servers and one EC2 instance, Centos 6 and Ubuntu 14.04
  • MySQL provided, tried on MySQL community and Percona Server
  • MySQL version, tried on 5.5.37 and 5.6.23 (Percona Server)
  • Varied the InnoDB log file size from 32MB to 512MB
  • The impacts of the number of InnoDB write threads (1,2,4,8,16,32)
  • The use of Linux native asynchronous iop
  • Spinning and SSD storage

So, with all those verifications done, I can maintain that if you are using a server with spinning disks and a high write load, using EXT4 transactions instead of the InnoDB double write buffer yields to an increase in throughput of more than 50%. In an upcoming post, I’ll show how the performance stability is affected by the InnoDB double-write buffer under a high write load.

Appendix: the relevant part of the my.cnf

innodb_buffer_pool_size = 12G
innodb_write_io_threads = 8 # or else in {1,2,4,8,16,32}
innodb_read_io_threads = 8
innodb_flush_log_at_trx_commit = 0 # must be 0 or 2 to really stress the double write buffer
innodb_log_file_size = 512M # or 32M, 64M
innodb_log_files_in_group = 2
innodb_flush_method=O_DIRECT # or O_DSYNC
innodb_buffer_pool_restore_at_startup=300 # On 5.5.x, important to warm up the buffer pool
#innodb_buffer_pool_load_at_startup=ON # on 5.6, important to warm up the buffer pool
#innodb_buffer_pool_dump_at_shutdown=ON # on 5.6, important to warm up the buffer pool,
skip-innodb_doublewrite # or commented out
innodb_flush_neighbor_pages=none # or area for spinning

The post Update on the InnoDB double-write buffer and EXT4 transactions appeared first on MySQL Performance Blog.


LinkBenchX: benchmark based on arrival request rate

An idea for a benchmark based on the “arrival request” rate that I wrote about in a post headlined “Introducing new type of benchmark” back in 2012 was implemented in Sysbench. However, Sysbench provides only a simple workload, so to be able to compare InnoDB with TokuDB, and later MongoDB with Percona TokuMX, I wanted to use more complicated scenarios. (Both TokuDB and TokuMX are part of Percona’s product line, in the case you missed Tokutek now part of the Percona family.)

Thanks to Facebook – they provide LinkBench, a benchmark that emulates the social graph database workload. I made modifications to LinkBench, which are available here: The summary of modifications is

  • Instead of generating events in a loop, we generate events with requestrate and send the event for execution to one of available Requester thread.
  • At the start, we establish N (requesters) connections to database, which are idle by default, and just waiting for an incoming event to execute.
  • The main output of the benchmark is 99% response time for ADD_LINK (INSERT + UPDATE request) and GET_LINKS_LIST (range SELECT request) operations.
  • The related output is Concurrency, that is how many Requester threads are active during the time period.
  • Ability to report stats frequently (5-10 sec interval); so we can see a trend and a stability of the result.

Also, I provide a Java package, ready to execute, so you do not need to compile from source code. It is available on the release page at

So the main focus of the benchmark is the response time and its stability over time.

For an example, let’s see how TokuDB performs under different request rates (this was a quick run to demonstrate the benchmark abilities, not to provide numbers for TokuDB).

First graph is the 99% response time (in milliseconds), measured each 10 sec, for arrival rate 5000, 10000 and 15000 operations/sec:


or, to smooth spikes, the same graph, but with Log 10 scale for axe Y:

So there are two observations: the response time increases with an increase in the arrival rate (as it supposed to be), and there are periodical spikes in the response time.

And now we can graph Concurrency (how many Threads are busy working on requests)…

…with an explainable observation that more threads are needed to handle bigger arrival rates, and also during spikes all available 200 threads (it is configurable) become busy.

I am looking to adopt LinkBenchX to run an identical workload against MongoDB.
The current schema is simple

CREATE TABLE `linktable` (
  `id1` bigint(20) unsigned NOT NULL DEFAULT '0',
  `id2` bigint(20) unsigned NOT NULL DEFAULT '0',
  `link_type` bigint(20) unsigned NOT NULL DEFAULT '0',
  `visibility` tinyint(3) NOT NULL DEFAULT '0',
  `data` varchar(255) NOT NULL DEFAULT '',
  `time` bigint(20) unsigned NOT NULL DEFAULT '0',
  `version` int(11) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (link_type, `id1`,`id2`),
  KEY `id1_type` (`id1`,`link_type`,`visibility`,`time`,`id2`,`version`,`data`)
CREATE TABLE `counttable` (
  `id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `link_type` bigint(20) unsigned NOT NULL DEFAULT '0',
  `count` int(10) unsigned NOT NULL DEFAULT '0',
  `time` bigint(20) unsigned NOT NULL DEFAULT '0',
  `version` bigint(20) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`,`link_type`)
CREATE TABLE `nodetable` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `type` int(10) unsigned NOT NULL,
  `version` bigint(20) unsigned NOT NULL,
  `time` int(10) unsigned NOT NULL,
  `data` mediumtext NOT NULL,

I am open for suggestions as to what is the proper design of documents for MongoDB – please leave your recommendations in the comments.

The post LinkBenchX: benchmark based on arrival request rate appeared first on MySQL Performance Blog.


Measuring the impact of tcpdump on Very Busy Hosts

A few years back Deva wrote about how to use tcpdump on very busy hosts. That post sparked my interest about exploring how to measure the impact of tcpdump on very busy hosts. In this post, I wanted to highlight how much of an impact there really is and what options you have to make the query collection much more effective.

Some things you need to know:

  • The test is a sysbench read-only workload, 8 tables, 8 threads, 1000000 rows each with 16G of buffer pool. Dataset fully in memory.
  • sysbench is ran on the same host, on 1Gbps connection, sysbench can saturate the network and therefore affect my network test with netcat so I decided to run locally.
  • There are 13 tests, 5 minutes each with 1 minute interval, varying on how the dump file is captured.
    • First one as baseline is the MySQL slow query log.

      mysql -e 'set global long_query_time=0, slow_query_log=1; select sleep(300); set global long_query_time=1, slow_query_log=0;'
    • Second group is tcpdump with -w option, which means tcpdump itself is writing to the capture file.

      $DUMPCMD -i any -G 300 -W 1 -Z root -w tcpdump.pcap port 3306

      $DUMPCMD -i any -G 300 -W 1 -Z root -w tcpdump.pcap 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2'

      $DUMPCMD -i any -G 300 -W 1 -Z root -w tcpdump.pcap 'port 3306 and ( tcp[1] & 7 == 2 or tcp[3] & 7 == 2 )'
    • Third group, is using “packet-buffered” (-U option) to see if there will be improvement on response time.

      $DUMPCMD -i any -G 300 -W 1 -Z root -U -w tcpdump.pcap port 3306

      $DUMPCMD -i any -G 300 -W 1 -Z root -U -w tcpdump.pcap 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2'

      $DUMPCMD -i any -G 300 -W 1 -Z root -U -w tcpdump.pcap 'port 3306 and ( tcp[1] & 7 == 2 or tcp[3] & 7 == 2 )'
    • Next streams the backup to a remote location via netcat.

      $DUMPCMD -i any -G 300 -W 1 -Z root -w - port 3306 | nc remote_ip 33061

      $DUMPCMD -i any -G 300 -W 1 -Z root -U -w - port 3306 | nc remote_ip 33062

      $DUMPCMD -i any -G 300 -W 1 -Z root -U -w - 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2' | nc remote_ip 33063
    • The last group, the one most of us are probably accustomed with is piping the dumped packets to file.

      timeout -s KILL 300 $DUMPCMD -i any port 3306 > tcpdump.pcap

      timeout -s KILL 300 $DUMPCMD -i any 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2' > tcpdump.pcap

      timeout -s KILL 300 $DUMPCMD -i any 'port 3306 and ( tcp[1] & 7 == 2 or tcp[3] & 7 == 2 )' > tcpdump.pcap
    • $DUMPCMD

        is defined as

      tcpdump -s 65535 -x -nn -q -tttt
  • On each group there is an AND and OR variation in port filtering. I wanted to see whether how much of additional impact port expressions have. And as you will see below, they do not have significant impact on performance, but on number of queries captured.

I’ve graphed the sysbench data during the test and labeled each test for easy comparison on the graphs.

Sysbench_TPS-legend Sysbench_Response_Time-legened

Of course, I’ve also compared the size of resulting tcpdump capture and total queries identified when run through pt-query-digest.


  • We see that, piping the pcap (K, L, M) data as decoded packets has significant overhead in terms of number of captured queries, response time and reads requests completed.
  • Using the slow log has about 30% overhead in response time, nearly 20% drop in throughput but have highest number of queries captured.
  • Writing captured packets directly to binary file using the -w option has the lowest overhead in response time, around 10%. Throughput drops depending on how much filtering is involved though while also there are noticeable stalls when the operating system flushes the page cache. This side effect causes sysbench to drop to 0 reads or even reach response times of several seconds!
  • Streaming packets to a capable remote server in terms of network bandwidth, IO performance combined with -w option to capture binary data produces 20-25% overhead in response time, 10-15% drop in throughput, no stalls and number of queries captured as close to slow query log.


Use tcpdump -w option in all cases and decode later. If you are looking for an overall view of ALL your queries, streaming tcpdump data to remote is also ideal. If you have low bandwidth though i.e. 100Mbps, this might not be enough as 5mins of binary tcpdump data produced 31G of file. That is 105MBps requirement! In which case, consider writing to a separate partition with enough IO.

If you are using Percona Server or MariaDB and is only looking to capture a portion of your workload i.e. table scans, temp table on disk or rate limit the collection, the extended slow query logging capability with this versions are also an excellent way to capture the data you need.

The post Measuring the impact of tcpdump on Very Busy Hosts appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by