Sep
22
2017
--

How to Deal with XA Transactions Recovery

XA Transactions

XA TransactionsFor most people (including me until recently) database XA transactions are a fuzzy concept. In over eight years with Percona, I have never had to deal with XA transactions. Then a few weeks ago I got two customers having issues with XA transactions. That deserves a post.

XA 101

What are XA transactions? XA transactions are useful when you need to coordinate a transaction between different systems. The simplest example could be simply two storage engines within MySQL. Basically, it follows this sequence:

  1. XA START
  2. Some SQL statements
  3. XA END
  4. XA PREPARE
  5. XA COMMIT or ROLLBACK

Once prepared, the XA transaction survives a MySQL crash. Upon restart, you’ll see something like this in the MySQL error log:

2017-08-23T14:53:54.189068Z 0 [Note] Starting crash recovery...
2017-08-23T14:53:54.189204Z 0 [Note] InnoDB: Starting recovery for XA transactions...
2017-08-23T14:53:54.189225Z 0 [Note] InnoDB: Transaction 45093 in prepared state after recovery
2017-08-23T14:53:54.189244Z 0 [Note] InnoDB: Transaction contains changes to 2 rows
2017-08-23T14:53:54.189257Z 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2017-08-23T14:53:54.189267Z 0 [Note] Found 1 prepared transaction(s) in InnoDB
2017-08-23T14:53:54.189312Z 0 [Warning] Found 1 prepared XA transactions
2017-08-23T14:53:54.189329Z 0 [Note] Crash recovery finished.
2017-08-23T14:53:54.189472Z 0 [Note] InnoDB: Starting recovery for XA transactions...
2017-08-23T14:53:54.189489Z 0 [Note] InnoDB: Transaction 45093 in prepared state after recovery
2017-08-23T14:53:54.189501Z 0 [Note] InnoDB: Transaction contains changes to 2 rows
2017-08-23T14:53:54.189520Z 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2017-08-23T14:53:54.189529Z 0 [Note] Found 1 prepared transaction(s) in InnoDB
2017-08-23T14:53:54.189539Z 0 [Warning] Found 1 prepared XA transactions

The command

xa recover

 shows you an output like:

mysql> xa recover;
+----------+--------------+--------------+-----------+
| formatID | gtrid_length | bqual_length | data      |
+----------+--------------+--------------+-----------+
|     1234 |            4 |            5 |  bqual |
+----------+--------------+--------------+-----------+
1 row in set (0.00 sec)

There are some binary data that can’t be shown in HTML. The XA Xid is made of three fields: gtrid (global trx id), bqual (branch qualifier) and formatId. Java applications use all three fields. For my example above, I used “X’01020304′,’bqual’,1234”. You can trust Java application servers to be creative with Xid values. With MySQL 5.7, you can output the data part in hex with

convert xid

 :

mysql> xa recover convert xid;
+----------+--------------+--------------+----------------------+
| formatID | gtrid_length | bqual_length | data                 |
+----------+--------------+--------------+----------------------+
|     1234 |            4 |            5 | 0x01020304627175616C |
+----------+--------------+--------------+----------------------+
1 row in set (0.01 sec)

The Problem

If you do nothing, the prepared transaction stays there forever and holds locks and a read view open. As a consequence, the history list grows without bound along with your ibdata1 file, where the undo entries are kept. If you have slaves, they all have the prepared transaction too (at least with 5.7). No fun.

As a consequence, if you are using XA transactions, you MUST check if there are prepared transactions pending after the server or mysqld restarted. If you find such transactions, you need to commit or roll them back, depending on what is involved.

But how do you commit these XA transactions? The problem here is the output of

xa recover

. As it is, the output is unusable if there is a bqual field or non-default formatID field:

mysql> xa commit 0x01020304627175616C;
ERROR 1397 (XAE04): XAER_NOTA: Unknown XID

The Fix

Looking back at the

xa recover convert xid

 output above, the gtrid_length and bqual_length are provided. With the use of these values, you can extract the parts of the data field which gives us:

  • gtrid = 0x01020304
  • bqual = 0x627175616C

And, of course, the formatID is 1234. Altogether, we have:

mysql> xa commit 0x01020304,0x627175616C,1234;
Query OK, 0 rows affected (0.15 sec)

Which finally works! On 5.6 the

convert xid

 option is not available. You have to be a bit more creative:

root@master57:/var/lib/mysql# mysql -r -e 'xa recoverG' | hexdump -C
00000000  2a 2a 2a 2a 2a 2a 2a 2a  2a 2a 2a 2a 2a 2a 2a 2a  |****************|
00000010  2a 2a 2a 2a 2a 2a 2a 2a  2a 2a 2a 20 31 2e 20 72  |*********** 1. r|
00000020  6f 77 20 2a 2a 2a 2a 2a  2a 2a 2a 2a 2a 2a 2a 2a  |ow *************|
00000030  2a 2a 2a 2a 2a 2a 2a 2a  2a 2a 2a 2a 2a 2a 0a 20  |**************. |
00000040  20 20 20 66 6f 72 6d 61  74 49 44 3a 20 31 32 33  |   formatID: 123|
00000050  34 0a 67 74 72 69 64 5f  6c 65 6e 67 74 68 3a 20  |4.gtrid_length: |
00000060  34 0a 62 71 75 61 6c 5f  6c 65 6e 67 74 68 3a 20  |4.bqual_length: |
00000070  35 0a 20 20 20 20 20 20  20 20 64 61 74 61 3a 20  |5.        data: |
00000080  01 02 03 04 62 71 75 61  6c 0a                    |....bqual.|
0000008a

But there is a limitation in 5.6: you can only XA commit/rollback transactions that belong to your session. That means after a crash you are out of luck. To get rid of these you need to promote a slave or perform a logical dump and restore. The best plan is to avoid the use of XA transactions with 5.6.

I submitted this bug to Percona Server for MySQL in order to get a usable output out of

xa recover convert xid

. If you think this is important, vote for it!

Sep
22
2017
--

This Week in Data with Colin Charles #7: Percona Live Europe and Open Source Summit North America

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Europe 2017Percona Live Europe Dublin

Are you affected by the Ryanair flight cancellations? Have you made alternative arrangements? Have you registered for the community dinner? Even speakers have to register, so this is a separate ticket cost! There will be fun lightning talks in addition to food and drink.

You are, of course, already registered for Percona Live Europe Dublin, right? See you there! Don’t forget to pack a brolly, or a rain jacket (if this week’s weather is anything to go by).

Open Source Summit North America

Last week, a lot of open source folk were in Los Angeles, California for the annual Open Source Summit North America (formerly known as LinuxCon). I’ve been to many as a speaker, and have always loved going to the event (so save the date, in 2018 it is August 29-31 in Vancouver, British Columbia, Canada).

What were major themes this year? Containerization. Everyone (large and small) seem to be moving workloads into containers. Containers and stateful applications make things all the more interesting, as well as thoughts on performance. This is a big deal for us in the MySQL/MongoDB/other open source database space. Technologies to watch include: Docker/Moby, Kubernetes, and Mesos. These are technologies people are frankly already deploying on, and it looks like the on-ramp is coming. Videos to watch:

The cloud is still a big deal. Yes, people are all customers of Amazon Web Services. Sure they are looking at Microsoft Azure. Google Cloud Platform is – from my informal survey – the third most popular. In many instances, I had conversations about Oracle Cloud, and it looks like there is a huge push behind this (but not too many users that I’ve seen yet). So it’s still a bet on the future as it continues to be developed by engineers. A mention of Rackspace Cloud (which offers all the MySQL variants in the cloud) is good, but many large-scale shops haven’t thought about it.

There were also some “fun” keynotes:

I wish more events had this kind of diverse keynotes.

From a speaker standpoint, I enjoyed the speaker/sponsor dinner party (a great time to catch up with friends and meet new ones), as well as the t-shirt and speaker gift (wooden board). I had a great time at the attendee expo hall reception and the party at Paramount Studios (lots of fun catered things, like In-N-Out burgers!).

Releases

  • ProxySQL 1.4.3. Hot on the heels of 1.4.2 comes 1.4.3, nicknamed “The ClickHouse release.” Clients can connect to ProxySQL, and it will query a ClickHouse backend. Should be exciting for ClickHouse users. Don’t forget the SQLite support, too!
  • Percona XtraDB Cluster 5.6.37-26.21
  • MariaDB ColumnStore 1.1.0 Beta. Do you use ColumnStore? Or do you use ClickHouse? There’s a new beta that might be worth trying.
  • MySQL 8.0.3 Release Candidate. Download this on your way to Percona Live Europe Dublin! Try it. There are many talks for this, including a keynote. You’ll find things like Histograms, more improvements around the optimizer, JSON and GIS improvements, security improvements, resource groups seem very interesting, data dictionary changes and a whole lot more!

Link List

  • CallidusCloud Acquires OrientDB, the Leading Multi-Model Database Technology
  • Database provider MongoDB has filed to go public. Bound to happen, and some highlights according to TechCrunch: “The company brought in $101.4 million in revenue in the most recent year ending January 31, and around $68 million in the first six months ending July 31 this year. In that same period, MongoDB burned through $86.7 million in the year ending January 31 and $45.8 million in the first six months ending July 31. MongoDB’s revenue is growing, and while its losses seem to be stable, they aren’t shrinking either. There have been over 30 million downloads of MongoDB community, and the link also has a nice cap table pie chart.”

Upcoming appearances

Percona’s website keeps track of community events, so check that out and see where to listen to a Perconian speak. My upcoming appearances are:

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Sep
20
2017
--

sysbench Histograms: A Helpful Feature Often Overlooked

Sysbench Histograms

Sysbench HistogramsIn this blog post, I will demonstrate how to run and use sysbench histograms.

One of the features of sysbench that I often I see overlooked (and rarely used) is its ability to produce detailed query response time histograms in addition to computing percentile numbers. Looking at histograms together with throughput or latency over time provides many additional insights into query performance.

Here is how you get detailed sysbench histograms and performance over time:

sysbench --rand-type=uniform --report-interval=1 --percentile=99 --time=300 --histogram --mysql-password=sbtest oltp_point_select --table_size=400000000 run

There are a few command line options to consider:

  • report-interval=1 prints out the current performance measurements every second, which helps see if performance is uniform, if you have stalls or otherwise high variance
  • percentile=99 computes 99 percentile response time, rather than 95 percentile (the default); I like looking at 99 percentile stats as it is a better measure of performance
  • histogram=on produces a histogram at the end of the run (as shown below)

The first thing to note about this histogram is that it is exponential. This means the width of the buckets changes with higher values. It starts with 0.001 ms (one microsecond) and gradually grows. This design is used so that sysbench can deal with workloads with requests that take small fractions of milliseconds, as well as accommodate requests that take many seconds (or minutes).

Next, we learn some us very interesting things about typical request response time distribution for databases. You might think that this distribution would be close to some to some “academic” distributions, such as normal distribution. In reality, we often observe is something of a “camelback” distribution (not a real term) – and our “camel” can have more than two humps (especially for simple requests such as the single primary key lookup shown here).

Why do request response times tend to have this distribution? It is because requests can take multiple paths inside the database. For example, certain requests might get responses from the MySQL Query Cache (which will result in the first hump). A second hump might come from resolving lookups using the InnoDB Adaptive Hash Index. A third hump might come from finding all the data in memory (rather than the Adaptive Hash Index). Finally, another hump might coalesce around the time (or times) it takes to execute on requests that require disk IO.    

You also will likely see some long-tail data that highlights the fact that MySQL and Linux are not hard, real-time systems. As an example, this very simple run with a single thread (and thus no contention) has an outlier at around 18ms. Most of the requests are served within 0.2ms or less.

As you add contention, row-level locking, group commit and other issues, you are likely to see even more complicated diagrams – which can often show you something unexpected:

Latency histogram (values are in milliseconds)
      value  ------------- distribution ------------- count
      0.050 |                                         1
      0.051 |                                         2
      0.052 |                                         2
      0.053 |                                         54
      0.053 |                                         79
      0.054 |                                         164
      0.055 |                                         883
      0.056 |*                                        1963
      0.057 |*                                        2691
      0.059 |**                                       4047
      0.060 |****                                     9480
      0.061 |******                                   15234
      0.062 |********                                 20723
      0.063 |********                                 20708
      0.064 |**********                               26770
      0.065 |*************                            35928
      0.066 |*************                            34520
      0.068 |************                             32247
      0.069 |************                             31693
      0.070 |***************                          41682
      0.071 |**************                           37862
      0.073 |********                                 22691
      0.074 |******                                   15907
      0.075 |****                                     10509
      0.077 |***                                      7853
      0.078 |****                                     9880
      0.079 |****                                     10853
      0.081 |***                                      9243
      0.082 |***                                      9280
      0.084 |***                                      8947
      0.085 |***                                      7869
      0.087 |***                                      8129
      0.089 |***                                      9073
      0.090 |***                                      8364
      0.092 |***                                      6781
      0.093 |**                                       4672
      0.095 |*                                        3356
      0.097 |*                                        2512
      0.099 |*                                        2177
      0.100 |*                                        1784
      0.102 |*                                        1398
      0.104 |                                         1082
      0.106 |                                         810
      0.108 |                                         742
      0.110 |                                         511
      0.112 |                                         422
      0.114 |                                         330
      0.116 |                                         259
      0.118 |                                         203
      0.120 |                                         165
      0.122 |                                         126
      0.125 |                                         108
      0.127 |                                         87
      0.129 |                                         83
      0.132 |                                         55
      0.134 |                                         42
      0.136 |                                         45
      0.139 |                                         41
      0.141 |                                         149
      0.144 |                                         456
      0.147 |                                         848
      0.149 |*                                        2128
      0.152 |**                                       4586
      0.155 |***                                      7592
      0.158 |*****                                    13685
      0.160 |*********                                24958
      0.163 |*****************                        44558
      0.166 |*****************************            78332
      0.169 |*************************************    98616
      0.172 |**************************************** 107664
      0.176 |**************************************** 107154
      0.179 |****************************             75272
      0.182 |******************                       49645
      0.185 |****************                         42793
      0.189 |*****************                        44649
      0.192 |****************                         44329
      0.196 |******************                       48460
      0.199 |*****************                        44769
      0.203 |**********************                   58578
      0.206 |***********************                  61373
      0.210 |**********************                   58758
      0.214 |******************                       48012
      0.218 |*************                            34533
      0.222 |**************                           36517
      0.226 |*************                            34645
      0.230 |***********                              28694
      0.234 |*******                                  17560
      0.238 |*****                                    12920
      0.243 |****                                     10911
      0.247 |***                                      9208
      0.252 |****                                     10556
      0.256 |***                                      7561
      0.261 |**                                       5047
      0.266 |*                                        3757
      0.270 |*                                        3584
      0.275 |*                                        2951
      0.280 |*                                        2078
      0.285 |*                                        2161
      0.291 |*                                        1747
      0.296 |*                                        1954
      0.301 |*                                        2878
      0.307 |*                                        2810
      0.312 |*                                        1967
      0.318 |*                                        1619
      0.324 |*                                        1409
      0.330 |                                         1205
      0.336 |                                         1193
      0.342 |                                         1151
      0.348 |                                         989
      0.354 |                                         985
      0.361 |                                         799
      0.367 |                                         671
      0.374 |                                         566
      0.381 |                                         537
      0.388 |                                         351
      0.395 |                                         276
      0.402 |                                         214
      0.409 |                                         143
      0.417 |                                         80
      0.424 |                                         85
      0.432 |                                         54
      0.440 |                                         41
      0.448 |                                         29
      0.456 |                                         16
      0.464 |                                         15
      0.473 |                                         11
      0.481 |                                         4
      0.490 |                                         9
      0.499 |                                         4
      0.508 |                                         3
      0.517 |                                         4
      0.527 |                                         4
      0.536 |                                         2
      0.546 |                                         4
      0.556 |                                         4
      0.566 |                                         4
      0.587 |                                         1
      0.597 |                                         1
      0.608 |                                         5
      0.619 |                                         3
      0.630 |                                         2
      0.654 |                                         2
      0.665 |                                         5
      0.677 |                                         26
      0.690 |                                         298
      0.702 |                                         924
      0.715 |*                                        1493
      0.728 |                                         1027
      0.741 |                                         1112
      0.755 |                                         1127
      0.768 |                                         796
      0.782 |                                         574
      0.797 |                                         445
      0.811 |                                         415
      0.826 |                                         296
      0.841 |                                         245
      0.856 |                                         202
      0.872 |                                         210
      0.888 |                                         168
      0.904 |                                         217
      0.920 |                                         163
      0.937 |                                         157
      0.954 |                                         204
      0.971 |                                         155
      0.989 |                                         158
      1.007 |                                         137
      1.025 |                                         94
      1.044 |                                         79
      1.063 |                                         52
      1.082 |                                         36
      1.102 |                                         25
      1.122 |                                         25
      1.142 |                                         16
      1.163 |                                         8
      1.184 |                                         5
      1.205 |                                         7
      1.227 |                                         2
      1.250 |                                         4
      1.272 |                                         3
      1.295 |                                         3
      1.319 |                                         2
      1.343 |                                         2
      1.367 |                                         1
      1.417 |                                         2
      1.791 |                                         1
      1.996 |                                         2
      2.106 |                                         2
      2.184 |                                         1
      2.264 |                                         1
      2.347 |                                         2
      2.389 |                                         1
      2.433 |                                         1
      2.477 |                                         1
      2.568 |                                         2
      2.615 |                                         1
      2.710 |                                         1
      2.810 |                                         1
      2.861 |                                         1
      3.187 |                                         1
      3.488 |                                         1
      3.816 |                                         1
      4.028 |                                         1
      6.913 |                                         1
      7.565 |                                         1
      8.130 |                                         1
     17.954 |                                         1

I hope you give sysbench histograms a try, and see what you can discover!

Sep
19
2017
--

Percona Live Europe Featured Talks: Automatic Database Management System Tuning Through Large-Scale Machine Learning with Dana Van Aken

Percona Live Europe 2017

Percona Live EuropeWelcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post is with Dana Van Aken, a Ph.D. student in Computer Science at Carnegie Mellon University. Her talk is titled Automatic Database Management System Tuning Through Large-Scale Machine Learning. DBMSs are difficult to manage because they have hundreds of configuration “knobs” that control factors such as the amount of memory to use for caches and how often to write data to storage. Organizations often hire experts to help with tuning activities, but experts are prohibitively expensive for many. In this talk, Dana will present OtterTune, a new tool that can automatically find good settings for a DBMS’s configuration knobs. In our conversation, we discussed how machine learning helps DBAs manage DBMSs:

Percona: How did you get into database technology? What do you love about it?

Dana: I got involved with research as an undergrad and ended up working on a systems project with a few Ph.D. students. It turned out to be a fantastic experience and is what convinced me to go for my Ph.D. I visited potential universities and chatted with many faculty members. I met with my current advisor at Carnegie Mellon University, Andy Pavlo, for a half hour and left his office excited about databases and the research problems he was interested in. Three years later, I’m even more excited about databases and the progress we’ve made in developing smarter auto-tuning techniques.

Percona: You’re presenting a session called “Automatic Database Management System Tuning Through Large-Scale Machine Learning”. How does automation make DBAs life easier in a DBMS production environment?

Dana: The role of the DBA is becoming more challenging due to the advent of new technologies and increasing scalability requirements of data-intensive applications. Many DBAs are constantly having to adjust their responsibilities to manage more database servers or support new platforms to meet an organization’s needs as they change over time. Automation is critical for reducing the DBA’s workload to a manageable size so that they can focus on higher-value tasks. Many organizations now automate at least some of the repetitive tasks that were once DBA responsibilities: several have adopted public/private cloud-based services whereas others have built their own automated solutions internally.

The problem is that the tasks that have now become the biggest time sinks for DBAs are much harder to automate. For example, DBMSs have dozens of configuration options. Tuning them is an essential but tedious task for DBAs, because it’s a trial and error approach even for experts. What makes this task even more time-consuming is that the best configuration for one DBMS may not be the best for another. It depends on the application’s workload and the server’s hardware. Given this, successfully automating DBMS tuning is a big win for DBAs since it would streamline common configuration tasks and give DBAs more time to deal with other issues. This is why we’re working hard to develop smarter tuning techniques that are mature and practical enough to be used in a production environment.

Percona: What do you want attendees to take away from your session? Why should they attend?

Dana: I’ll be presenting OtterTune, a new tool that we’re developing at Carnegie Mellon University that can automatically find good settings for a DBMS’s configuration knobs. I’ll first discuss the practical aspects and limitations of the tool. Then I’ll move on to our machine learning (ML) pipeline. All of the ML algorithms that we use are popular techniques that have both practical and theoretical work backing their effectiveness. I’ll discuss each algorithm in our pipeline using concrete examples from MySQL to give better intuition about what we are doing. I will also go over the outputs from each stage (e.g., the configuration parameters that the algorithm find to be the most impactful on performance). I will then talk about lessons I learned along the way, and finally wrap up with some exciting performance results that show how OtterTune’s configurations compared to those created by top-notch DBAs!

My talk will be accessible to a general audience. You do not need a machine learning background to understand our research.

Percona: What are you most looking forward to at Percona Live Europe 2017?

Dana: This is my first Percona Live conference, and I’m excited about attending. I’m looking forward to talking with other developers and DBAs about the projects they’re working on and the challenges they’re facing and getting feedback on OtterTune and our ideas.

Want to find out more about Dana and machine learning for DBMS management? Register for Percona Live Europe 2017, and see his talk Automatic Database Management System Tuning Through Large-Scale Machine Learning. Register now to get the best price! Use discount code SeeMeSpeakPLE17 to get 10% off your registration.

Percona Live Open Source Database Conference Europe 2017 in Dublin is the premier European open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, MariaDB, MongoDB, time series database, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference Europe will be September 25-27, 2017 at the Radisson Blu Royal Hotel, Dublin.

Sep
18
2017
--

Percona Live Europe Featured Talks: Debugging with Logs (and Other Events) Featuring Charity Majors

Percona Live Europe 2017

Percona Live EuropeWelcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post is with Charity Majors, CEO/Cofounder of Honeycomb. Her talk is Debugging with Logs (and Other Events). Her presentation covers some of the lessons every engineer should know (and often learns the hard way): why good logging solutions are so expensive, why treating your logs as strings can be costly and dangerous, how logs can impact code efficiency and add/fix/change race conditions in your code. In our conversation, we discussed debugging your database environment:

Percona: How did you get into database technology? What do you love about it?

Charity: Oh dear, I don’t. I hate databases. Data is the scariest, hardest part of computing. The stakes are highest and the mistakes the most permanent. Data is where you can kill any company with the smallest number of errors. That’s why I always end up in charge of the databases – I just don’t trust anybody else with the power. (Also, I’m an adrenaline junkie who gets off on high stakes. I could gamble or I could do databases, and I know too much math to gamble.) Literally, nobody loves databases. If they tell you anything different, they are either lying to you or they’re nowhere near production.

I got into databases from operations. I’ve been on call since I was 17, over half my life. I am really stubborn, have an inflated sense of my own importance and like solving problems, so operations was a natural fit. I started diving on the databases grenades when I worked at Linden Lab and MySQL was repeatedly killing us. It seemed impossible, so I volunteered to own it. I’ve been doing that ever since.

Percona: You’re presenting a session called “Debugging with Logs (and Other Events)”. What is the importance of logs for databases and DBAs?

Charity: I mean, it’s not really about logs. I might change my title. It’s about understanding WTF is going on. Logs are one way of extracting events in a format that humans can understand. My startup is all about “what’s happening right now; what’s just happened?” Which is something we are pretty terrible at as an industry. Databases are just another big complex piece of software, and the only reason we have DBAs is because the tooling has historically been so bad that you had to specialize in this piece of software as your entire career.

The tooling is getting better. With the right tools, you don’t have to skulk around like a detective trying to model and predict what might be happening, as though it were a living organism. You can simply sum up the lock time being held, and show what actor is holding it. It’s extremely important that we move away from random samples and pre-aggregated metrics, toward dynamic sampling and rich events. That’s the only way you will ever truly understand what is happening under the hood in your database. That’s part of what my company was built to do.

Percona: How can logging be used in debugging to track down database issues? Can logging affect performance?

Charity: Of course logging can affect performance. For any high traffic website, you should really capture your logs (events) by streaming tcpdump over the wire. Most people know how to do only one thing with db logs: look for slow queries. But those slow queries can be actively misleading! A classic example is when somebody says “this query is getting slow” and they look at source control and the query hasn’t been modified in years. The query is getting slower either because the data volume is growing (or data shape is changing), or because reads can yield but writes can’t, and the write volume has grown to the point where reads are spending all their time waiting on the lock.

Yep, most db logs are terrible.

Percona: What do you want attendees to take away from your session? Why should they attend?

Charity: Lots of cynicism. Everything in computers is terrible, but especially so with data. Everything is a tradeoff, all you can hope to do is be aware of the tradeoffs you are making, and what costs you are incurring whenever you solve a given problem. Also, I hope people come away trembling at the thought of adding any more strings of logs to production. Structure your logs, people! Grep is not the answer to every single question! It’s 2017, nearly 2018, and unstructured logs do not belong anywhere near production.

Percona: What are you most looking forward to at Percona Live Europe 2017?

Charity: My coauthor Laine and I are going to be signing copies of our book Database Reliability Engineering and giving a short keynote on the changes in our field. I love the db community, miss seeing Mark Callaghan and all my friends from the MongoDB and MySQL world, and cannot wait to laugh at them while they cry into their whiskey about locks or concurrency or other similar nonsense. Yay!

Want to find out more about Charity and database debugging? Register for Percona Live Europe 2017, and see her talk Debugging with Logs (and Other Events). Register now to get the best price! Use discount code SeeMeSpeakPLE17 to get 10% off your registration.

Percona Live Open Source Database Conference Europe 2017 in Dublin is the premier European open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, MariaDB, MongoDB, time series database, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference Europe will be September 25-27, 2017 at the Radisson Blu Royal Hotel, Dublin.

Sep
15
2017
--

This Week in Data with Colin Charles #6: Open Source Summit and Percona Live Europe

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

What a long, packed week! Spent most of it at Open Source Summit North America, while still enjoying the myriad phone calls and meetings you have as a Perconian. In addition to two talks, I also gave a webinar this week on the differences between MySQL and MariaDB (I’ll post a blog Q&A in the near future).

Colin CharlesPercona Live Europe Dublin

Have you registered for Percona Live Europe Dublin? If no, what’s keeping you from doing so?

In addition, I think it’s definitely worth registering for the community dinner. You can hang out with other like-minded folks, and see the lightning talks (we may announce more as time gets closer).

See what the MySQL Team will speak about at Percona Live Dublin. You’ll notice that a few of the releases I mention below have Percona Live Europe talks associated with them.

Releases

Link List

Feedback

On a somber note, former Perconian and all round great community member, Jaakko Pesonen passed away. Shlomi Noach commented online: Remembering Jaakko Pesonen.

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Sep
13
2017
--

Massive Parallel Log Processing with ClickHouse

ClickHouse

In this blog, I’ll look at how to use ClickHouse for parallel log processing.

Percona is seen primarily for our expertise in MySQL and MongoDB (at this time), but neither is quite suitable to perform heavy analytical workloads. There is a need to analyze data sets, and a very popular task is crunching log files. Below I’ll show how ClickHouse can be used to efficiently perform this task. ClickHouse is attractive because it has multi-core parallel query processing, and it can even execute a single query using multiple CPUs in the background.

I am going to check how ClickHouse utilizes multiple CPU cores and threads. I will use a server with two sockets, equipped with “Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz” in each. That gives a total of 28 CPU cores / 56 CPU threads.

To analyze workload, I’ll use an Apache log file from one of Percona’s servers. The log has 1.56 billion rows, and uncompressed it takes 274G. When inserted into ClickHouse, the table on disk takes 9G.

How do we insert the data into ClickHouse? There is a lot of scripts to transform Apache log format to CSV, which ClickHouse can accept. As for the base, I used this one:

https://gist.github.com/sepehr/fff4d777509fa7834531

and my modification you can find here:

https://github.com/vadimtk/clickhouse-misc/blob/master/apachelog-to-csv.pl

The ClickHouse table definition:

CREATE TABLE default.apachelog ( remote_host String, user String, access_date Date, access_time DateTime, timezone String, request_method String, request_uri String, status UInt32, bytes UInt32, referer String, user_agent String) ENGINE = MergeTree(access_date, remote_host, 8192)

To test how ClickHouse scales on multiple CPU cores/threads, I will execute the same query by allocating from 1 to 56 CPU threads for ClickHouse processes. This can be done as:

ps -eLo cmd,tid | grep clickhouse-server | perl -pe 's/.* (d+)$/1/' | xargs -n 1 taskset -cp 0-$i

where $i is (N CPUs-1).

We must also take into account that not all queries are equal. Some are easier to execute in parallel than others. So I will test three different queries. In the end, we can’t get around Amdahl’s Law!

The first query should be easy to execute in parallel:

select extract(request_uri,'(w+)$') p,sum(bytes) sm,count(*) c from apachelog group by p order by c desc limit 100

Speedup:

CPUs Time, sec Speedup to 1 CPU
1 823.646 1
2 413.832 1.990291
3 274.548 3.000007
4 205.961 3.999039
5 164.997 4.991885
6 137.455 5.992114
7 118.079 6.975381
8 103.015 7.995399
9 92.01 8.951701
10 82.853 9.941052
11 75.334 10.93326
12 69.23 11.89724
13 63.848 12.90011
14 59.388 13.8689
15 55.433 14.85841
16 52.158 15.79136
17 49.054 16.7906
18 46.331 17.77743
19 43.985 18.72561
20 41.795 19.70681
21 39.763 20.71388
22 38.031 21.65723
23 36.347 22.66063
24 34.917 23.58868
25 33.626 24.49432
26 32.42 25.40549
27 31.21 26.39045
28 30.135 27.33187
29 29.947 27.50346
30 29.709 27.72379
31 29.283 28.1271
32 28.979 28.42217
33 28.807 28.59187
34 28.477 28.9232
35 28.146 29.26334
36 27.921 29.49916
37 27.613 29.8282
38 27.366 30.09742
39 27.06 30.43777
40 26.817 30.71358
41 26.644 30.913
42 26.394 31.2058
43 26.215 31.41888
44 25.994 31.686
45 25.762 31.97135
46 25.554 32.23159
47 25.243 32.62869
48 25.102 32.81197
49 24.946 33.01716
50 24.668 33.38925
51 24.537 33.56751
52 24.278 33.92561
53 24.035 34.26861
54 23.839 34.55036
55 23.734 34.70321
56 23.587 34.91949

 

It’s much more interesting to chart these results:

From the chart, we can see that the query scales linearly up to 28 cores. After that, it continues to scale up to 56 threads (but with a lesser slope). I think this is related to the CPU architecture (remember we have 28 physical cores and 56 CPU “threads”). Let’s look at the results again. With one available CPU, the query took 823.6 sec to execute. With all available CPUs, it took 23.6 sec. So the total speedup is 34.9 times.

But let’s consider a query that allows a lesser degree of parallelism. For example, this one:

select access_date c2, count(distinct request_uri) cnt from apachelog group by c2 order by c2 limit 300

This query uses aggregation that counts unique URIs, which I am sure limits the counting process to a single shared structure. So some part of the execution is limited to a single process. I won’t show the full results for all 1 to 56 CPUs, but for one CPU the execution time is 177.715 sec, and for 56 CPUs the execution time is 11.564 sec. The total speedup is 15.4 times.

The speedup chart looks like this:

As we suspected, this query allows less parallelism. What about even heavier queries? Let’s consider this one:

SELECT y, request_uri, cnt FROM (SELECT access_date y, request_uri, count(*) AS cnt FROM apachelog GROUP BY y, request_uri ORDER BY y ASC ) ORDER BY y,cnt DESC LIMIT 1 BY y

In that query, we build a derived table (to resolve the subquery) and I expect it will limit the parallelism even further. And it does: with one CPU the query takes 183.063 sec to execute. With 56 CPUs it takes 28.572 sec. So the speedup is only 6.4 times.

The chart is:

Conclusions

ClickHouse can capably utilize multiple CPU cores available on the server, and query execution is not limited by a single CPU (like in MySQL). The degree of parallelism is defined by the complexity of the query, and in the best case scenario, we see linear scalability with the number of CPU cores. For the scaling on multiple servers you can see my previous post:

https://www.percona.com/blog/2017/06/22/clickhouse-general-analytical-workload-based-star-schema-benchmark/

However, if query execution is serial, it limits the speedup (as described in Amdahl’s Law).

One example is a 1.5 billion record Apache log, and we can see that ClickHouse can execute complex analytical queries within tens of seconds.

Sep
12
2017
--

cscope: Searching Code Efficiently

cscope

In this post, we will discuss how to search code with the help of cscope. Let’s begin by checking its description and capabilities (quoting directly from http://cscope.sourceforge.net/):

Cscope is a developer’s tool for browsing source code.

  • Allows searching code for:
    • all references to a symbol
    • global definitions
    • functions called by a function
    • functions calling a function
    • text string
    • regular expression pattern
    • a file
    • files including a file
  • Curses based (text screen)
  • An information database is generated for faster searches and later reference
  • The fuzzy parser supports C, but is flexible enough to be useful for C++ and Java, and for use as a generalized ‘grep database’ (use it to browse large text documents!)

Of course, developers aren’t the only ones browsing the code (as implied by the tool’s description). In the Support team, we find ourselves having to check code many times. This tool is a great aid in doing so. As you can imagine already, this tool can replace find and grep -R "<keyword(s)>" *, and will even add more functionality! Not only this, but our searches run faster (since they are indexed).

The main focus of this post is to explore cscope’s searching capabilities regarding code, but note that you can also use it for text searches that aren’t linked to function names or symbols (supporting regular expressions) and for file searches. This also means that even if the tool doesn’t recognize a function name, you can still use the text search as a fallback.

There is an online manual page, for quick reference:

http://cscope.sourceforge.net/cscope_man_page.html

To install it under RHEL/CentOS, simply issue:

shell> yum install cscope

You can use cscope with MySQL, Percona Server for MySQL or MariaDB code alike. In my case, I had a VM with Percona Server for MySQL 5.7.18 already available, so I’ve used that for demonstration purposes.

We should first get the source code for the exact version we are working with, and build the cscope database (used by the tool to perform searches):

shell> wget https://www.percona.com/downloads/Percona-Server-LATEST/Percona-Server-5.7.18-15/source/tarball/percona-server-5.7.18-15.tar.gz
shell> tar xzf percona-server-5.7.18-15.tar.gz
shell> cd percona-server-5.7.18-15
shell> cscope -bR

-b will build the database only, without accessing the CLI; -R will recursively build the symbol database from the directory it’s executed, down. We can also add -q for fast symbol lookup, at the expense of a larger database (we’ll check how much more below).

Now that we have built the cscope database, we will see a new file created: cscope.out. If we used -q, we will also see: cscope.in.out and cscope.po.out. Their sizes depend on the size of the codebase in question. Here are the sizes before and after building the cscope database (with -q):

shell> du -d 1 -h ..
615M ../percona-server-5.7.18-15
shell> cscope -bqR
shell> du -h cscope.*
8.2M cscope.in.out
69M cscope.out
103M cscope.po.out
shell> du -d 1 -h ..
794M ../percona-server-5.7.18-15

This gives around 30% increase in size while using -q, and around 10% increase without it. Your mileage may vary: be aware of this if you are using it on a test server with many different versions, or if the project size is considerably larger. It shouldn’t be much of a problem, but it’s something to take into account.

Ok, enough preamble already, let’s see it in action! To access the CLI, we can use cscope -d.

A picture is worth a thousand words. The following output corresponds to searching for the MAX_MAX_ALLOWED_PACKET symbol:

cscope

If there are multiple potential matches, the tool lists them for our review. If there is only one match, it will automatically open the file, with the cursor at the appropriate position. To check a match, either select it with the arrow keys and hit enter, or use the number/letter listed. When you are done and need to get back to cscope to continue checking other matches, simply exit the text editor (which can be defined by using CSCOPE_EDITOR). To get back to the main menu to modify the search, press CTRL-f. To exit the tool press CTRL-d. Lastly, CTRL-c toggles case insensitive mode on and off.

To show how the tool displays searches with many hits, let’s search for functions that call printf:

cscope

We can now see that letters are also used to list options, and that we can hit space to page down for more matches (from a total of 4508).

Lastly, as mentioned before if everything else fails and you are not able to find the function or symbol you need (due to limitations or bugs), you can use the “Find this text string” and “Find this egrep pattern” functionality.

I hope this brief tour of cscope has been useful, and helps you get you started using it. Note that you can use it for other projects, and it can be handy if you need to dive into the Linux kernel too.

Addendum

For even more power, you can read this vim tutorial (http://cscope.sourceforge.net/cscope_vim_tutorial.html), or set up ctags (http://ctags.sourceforge.net/) along with cscope.

Sep
12
2017
--

Upcoming Webinar September 14, 2017: Supercharge Your Analytics with ClickHouse

ClickHouse

ClickHouseJoin Percona’s CTO Vadim Tkachenko @VadimTk and Altinity’s Co-Founder, Alexander Zaitsev as they present Supercharge Your Analytics with ClickHouse on Thursday, September 14, 2017, at 10:00 am PDT / 1:00 pm EDT (UTC-7).

 

ClickHouse is a real-time analytical database system. Even though they’re only celebrating one year as open source software, it has already proved itself ready for serious workloads.

We will talk about ClickHouse in general, some of its internals and why it is so fast. ClickHouse works in conjunction with MySQL – traditionally weak for analytical workloads – and this presentation demonstrates how to make the two systems work together.

There will also be an in-person presentation on How to Build Analytics for 100bn Logs a Month with ClickHouse at the meetup Wednesday, September 13, 2017. RSVP here.

Alexander Zaitsev will also be speaking at Percona Live Europe 2017 on Building Multi-Petabyte Data Warehouses with ClickHouse on Wednesday, September 27 at 11:30 am. Use the promo code “SeeMeSpeakPLE17” for 15% off.

Alexander ZaitsevAlexander Zaitsev
Altinity’s Co-Founder
Alexander is a co-founder of Altinity. He has 20 years of engineering and engineering management experience in several international companies. Alexander is expert in high scale analytics systems design and implementation. He designed and deployed petabyte scale data warehouses, including one of earliest ClickHouse deployments outside of Yandex.

Vadim Tkachenko
CTO
Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products.

He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition. Previously, he founded a web development company in his native Ukraine and spent two years in the High Performance Group within the official MySQL support team. Vadim received a BS in Economics and an MS in computer science from the National Technical University of Ukraine. He now lives in California with his wife and two children.

Sep
11
2017
--

Updating InnoDB Table Statistics Manually

InnoDB Tables

InnoDB TablesIn this post, we will discuss how to fix cardinality for InnoDB tables manually.

As a support engineer, I often see situations when the cardinality of a table is not correct. When InnoDB calculates the cardinality of an index, it does not scan the full table by default. Instead it looks at random pages, as determined by options innodb_stats_sample_pages, innodb_stats_transient_sample_pages and innodb_stats_persistent_sample_pages, or by the 

CREATE TABLE

 option

STATS_SAMPLE_PAGES

. The default value for persistent statistics is 20. This approach works fine when the number of unique values in your secondary key grows in step with the size of the table. But what if you have a column that has a comparatively small number of unique values? This could be a common service, many-to-many relationship table, for example, or just a table containing a list of sell orders that belong to one of a dozen shops owned by the company. Such tables could grow up to billions of rows with a small (less than 100) number of unique shop IDs.

At some point, InnoDB will report the wrong values for such indexes. Really! If 20 pages have 100 unique shop IDs, how many unique shop IDs would 20000 pages have? 100 times 1000? This seems logical, and after a certain number of rows such indexes will have extraordinarily large cardinality values.

ANALYZE TABLE

 will not help, because it uses the same algorithm. Increasing the number of “stats” sample pages would help, but it has its own downside: the more pages you have to examine, the slower

ANALYZE TABLE

 runs. While this command is not blocking, it still creates side effects as described in this blog post. And the longer it runs, the less control you have.

Another issue with InnoDB statistics: even if it is persistent and

STATS_AUTO_RECALC

 is set to 0, it still adds values for secondary indexes as shown in lp:1538765. Eventually, after you insert million of rows, your statistics get corrupted.

ANALYZE TABLE

  can fix it only if you specify a very large number of “stats” sample pages.

Can we do anything about it?

InnoDB stores statistics in the “mysql” database, in the tables

innodb_table_stats

 and

innodb_index_stats

. Since they are regular MySQL tables, privileged users can access them. We can update them and modify statistics as we like. And these statistics are used by the Optimizer!

I created a small example showing how to do this trick. I used Percona Server for MySQL version 5.7.19, but the trick will work on any supported MySQL and Percona Server for MySQL version.

First, let’s create test tables. The first table has shops, with a few shop profiles with the shop ID and name:

create table shops(
  shop_id int not null auto_increment primary key,
  name varchar(32)
) engine=innodb;

The second table refers to the “shops” table:

create table goods(
  id int not null auto_increment primary key,
  shop_id int not null,
  name varchar(32),
  create_date datetime DEFAULT NULL,
  key (shop_id, create_date)
) engine=innodb;

Let’s check how many unique shops we have:

mysql> select count(distinct shop_id) from shops;
+-------------------------+
| count(distinct shop_id) |
+-------------------------+
| 100                     |
+-------------------------+
1 row in set (0.02 sec)

With 100 distinct shops, and a key on

(shop_id, create_date)

, we expect cardinality in table goods to be not much different than this query result:

mysql> select count(distinct id) as `Cardinality for PRIMARY`,
    -> count(distinct shop_id) as `Cardinality for shop_id column in index shop_id`,
    -> count(distinct shop_id, create_date) as `Cardinality for create_date column in index shop_id`
    -> from goods
*************************** 1. row ***************************
Cardinality for PRIMARY: 8000000
Cardinality for shop_id column in index shop_id: 100
Cardinality for create_date column in index shop_id: 169861
1 row in set (2 min 8.74 sec)

However, 

SHOW INDEX

 returns dramatically different values for the column

shop_id

:

mysql> show index from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods |          0 |  PRIMARY |            1 |          id |         A |     7289724 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            1 |     shop_id |         A |       13587 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            2 | create_date |         A |      178787 |     NULL |   NULL |  YES |      BTREE |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.09 sec)

ANALYZE TABLE

 does not help:

mysql> analyze table goods;
+------------+---------+----------+----------+
|      Table |      Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze |   status |       OK |
+------------+---------+----------+----------+
1 row in set (0.88 sec)
mysql> show index from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods |          0 |  PRIMARY |            1 |          id |         A |     7765796 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            1 |     shop_id |         A |       14523 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            2 | create_date |         A |      168168 |     NULL |   NULL |  YES |      BTREE |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

As a result, if we join the two tables, Optimizer chooses the wrong

JOIN

 order and query execution plan:

mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
| id | select_type | table | partitions |  type | possible_keys |     key | key_len |                ref | rows | filtered |                    Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
|  1 |      SIMPLE | shops |       NULL | index |       PRIMARY | PRIMARY |       4 |               NULL |  100 |   100.00 | Using where; Using index |
|  1 |      SIMPLE | goods |       NULL |   ref |       shop_id | shop_id |       4 | test.shops.shop_id |  534 |    11.11 |    Using index condition |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.13 sec)
mysql> P md5sum
PAGER set to 'md5sum'
mysql> select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
4a94dabc4bfbfb7dd225bcb50278055b -
31896 rows in set (43.32 sec)

If compared to 

STRAIGHT_JOIN

 order:

mysql> explain select goods.* from goods straight_join shops on(goods.shop_id = shops.shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
| id | select_type | table | partitions |   type | possible_keys |     key | key_len |                ref |  rows | filtered |                 Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
|  1 |      SIMPLE | goods |       NULL |  range |       shop_id | shop_id |      10 |               NULL | 31997 |   100.00 | Using index condition |
|  1 |      SIMPLE | shops |       NULL | eq_ref |       PRIMARY | PRIMARY |       4 | test.goods.shop_id |     1 |   100.00 |           Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
2 rows in set, 1 warning (0.14 sec)
mysql> P md5sum
PAGER set to 'md5sum'
mysql> select goods.* from goods straight_join shops on(goods.shop_id = shops.shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
4a94dabc4bfbfb7dd225bcb50278055b -
31896 rows in set (7.94 sec)

The time difference for a small 8M row table is around six times! For a big table with many columns, it would be even larger.

Is

STRAIGHT_JOIN

 the only solution for this case?

No! It’s also not a great solution because if the query is complicated and involves more than two tables, it may be affected by bug fixes and improvements in the Optimizer code. Then the query order might not be optimal for new versions and updates. Therefore, you’ll need to test such queries at each upgrade, including minor ones.

So why does

ANALYZE TABLE

 not work? Because the default number of pages it uses to calculate statistics is too small for the difference. You can increase the table option

STATS_SAMPLE_PAGES

  until you find a proper one. The drawback is that the greater you set 

STATS_SAMPLE_PAGES

, the longer it takes for 

ANALYZE TABLE

 to finish. Also, if you update a large portion of the table, you are often affected by lp:1538765. At some point, the statistics will again be inaccurate.

Now let’s try our manual statistics update trick

InnoDB stores its persistent statistics in the tables

mysql.innodb_table_stats

  and

mysql.innodb_index_stats

:

mysql> alter table goods stats_persistent=1, stats_auto_recalc=0;
Query OK, 0 rows affected (0.11 sec)
Records: 0 Duplicates: 0 Warnings: 0
+---------------+------------+---------------------+---------+----------------------+--------------------------+
| database_name | table_name |         last_update |  n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
|          test |      goods | 2017-09-05 00:21:12 | 7765796 |                34624 |                    17600 |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
1 row in set (0.00 sec)
mysql> select * from mysql.innodb_index_stats where table_name='goods';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name |         last_update |    stat_name | stat_value | sample_size |                  stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
|          test |      goods |    PRIMARY | 2017-09-05 00:21:12 | n_diff_pfx01 |    7765796 |          20 |                                id |
|          test |      goods |    PRIMARY | 2017-09-05 00:21:12 | n_leaf_pages |      34484 |        NULL | Number of leaf pages in the index |
|          test |      goods |    PRIMARY | 2017-09-05 00:21:12 |         size |      34624 |        NULL |      Number of pages in the index |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 | n_diff_pfx01 |      14523 |          20 |                           shop_id |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 | n_diff_pfx02 |     168168 |          20 |               shop_id,create_date |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 | n_diff_pfx03 |    8045310 |          20 |            shop_id,create_date,id |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 | n_leaf_pages |      15288 |        NULL | Number of leaf pages in the index |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 |         size |      17600 |        NULL |      Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
8 rows in set (0.00 sec)

And we can update these tables directly:

mysql> update mysql.innodb_table_stats set n_rows=8000000 where table_name='goods';
Query OK, 1 row affected (0.18 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=8000000 where stat_description in('id', 'shop_id,create_date,id') and table_name='goods';
Query OK, 2 rows affected (0.08 sec)
Rows matched: 2 Changed: 2 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=100 where stat_description in('shop_id') and table_name='goods';
Query OK, 1 row affected (0.09 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=169861 where stat_description in('shop_id,create_date') and table_name='goods';
Query OK, 1 row affected (0.08 sec)
Rows matched: 1 Changed: 1 Warnings: 0

I took index values from earlier, as calculated by this query:

select count(distinct id) as `Cardinality for PRIMARY`, count(distinct shop_id) as `Cardinality for shop_id column in index shop_id`, count(distinct shop_id, create_date) as `Cardinality for create_date column in index shop_id` from goods;

mysql> select * from mysql.innodb_table_stats where table_name='goods';
+---------------+------------+---------------------+---------+----------------------+--------------------------+
| database_name | table_name |         last_update |  n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
|          test |      goods | 2017-09-05 00:47:45 | 8000000 |                34624 |                    17600 |
+---------------+------------+---------------------+---------+----------------------+--------------------------+
1 row in set (0.00 sec)
mysql> select * from mysql.innodb_index_stats where table_name='goods';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name |         last_update |    stat_name | stat_value | sample_size |                  stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
|          test |      goods |    PRIMARY | 2017-09-05 00:48:32 | n_diff_pfx01 |    8000000 |          20 |                                id |
|          test |      goods |    PRIMARY | 2017-09-05 00:21:12 | n_leaf_pages |      34484 |        NULL | Number of leaf pages in the index |
|          test |      goods |    PRIMARY | 2017-09-05 00:21:12 |         size |      34624 |        NULL |      Number of pages in the index |
|          test |      goods |    shop_id | 2017-09-05 00:49:13 | n_diff_pfx01 |        100 |          20 |                           shop_id |
|          test |      goods |    shop_id | 2017-09-05 00:49:26 | n_diff_pfx02 |     169861 |          20 |               shop_id,create_date |
|          test |      goods |    shop_id | 2017-09-05 00:48:32 | n_diff_pfx03 |    8000000 |          20 |            shop_id,create_date,id |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 | n_leaf_pages |      15288 |        NULL | Number of leaf pages in the index |
|          test |      goods |    shop_id | 2017-09-05 00:21:12 |         size |      17600 |        NULL |      Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
8 rows in set (0.00 sec)

Now the statistics are up to date, but not used:

mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
| id | select_type | table | partitions |  type | possible_keys |     key | key_len |                ref | rows | filtered |                    Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
|  1 |      SIMPLE | shops |       NULL | index |       PRIMARY | PRIMARY |       4 |               NULL |  100 |   100.00 | Using where; Using index |
|  1 |      SIMPLE | goods |       NULL |   ref |       shop_id | shop_id |       4 | test.shops.shop_id |  534 |    11.11 |    Using index condition |
+----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+
2 rows in set, 1 warning (0.04 sec)

To finalize the changes, we need to run

FLUSH TABLE goods

:

mysql> FLUSH TABLE goods;
Query OK, 0 rows affected (0.00 sec)
mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
| id | select_type | table | partitions |   type | possible_keys |     key | key_len |                ref |  rows | filtered |                 Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
|  1 |      SIMPLE | goods |       NULL |  range |       shop_id | shop_id |      10 |               NULL | 31997 |   100.00 | Using index condition |
|  1 |      SIMPLE | shops |       NULL | eq_ref |       PRIMARY | PRIMARY |       4 | test.goods.shop_id |     1 |   100.00 |           Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+
2 rows in set, 1 warning (0.28 sec)
mysql> P md5sum
PAGER set to 'md5sum'
mysql> select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486);
4a94dabc4bfbfb7dd225bcb50278055b -
31896 rows in set (7.79 sec)

Now everything is good.

But

FLUSH TABLE

 is a blocking operation, right? Won’t it block queries and create a worse scenario than described for ANALYZE TABLE in this post?

At first glance this is true. But we can use the same trick Percona Toolkit uses: set

lock_wait_timeout

 to 1 and call

FLUSH

 in a loop. To demonstrate how it works, I use a similar scenario as described in the

ANALYZE TABLE

 blog post.

First, let’s reset the statistics to ensure our

FLUSH

 works as expected:

mysql> analyze table goods;
+------------+---------+----------+----------+
|      Table |      Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.goods | analyze |   status |       OK |
+------------+---------+----------+----------+
1 row in set (0.38 sec)
mysql> show indexes from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods |          0 |  PRIMARY |            1 |          id |         A |     7765796 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            1 |     shop_id |         A |       14523 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            2 | create_date |         A |      168168 |     NULL |   NULL |  YES |      BTREE |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

And then update

mysql.innodb_*_stats

 tables manually. Then check that Optimizer still sees outdated statistics:

mysql> update mysql.innodb_table_stats set n_rows=8000000 where table_name='goods';
Query OK, 1 row affected (0.09 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=8000000 where stat_description in('id', 'shop_id,create_date,id') and table_name='goods';
Query OK, 2 rows affected (0.09 sec)
Rows matched: 2 Changed: 2 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=100 where stat_description in('shop_id') and table_name='goods';
Query OK, 1 row affected (0.11 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update mysql.innodb_index_stats set stat_value=169861 where stat_description in('shop_id,create_date') and table_name='goods';
Query OK, 1 row affected (0.10 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> show indexes from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods |          0 |  PRIMARY |            1 |          id |         A |     7765796 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            1 |     shop_id |         A |       14523 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            2 | create_date |         A |      168168 |     NULL |   NULL |  YES |      BTREE |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

Now let’s start a long running query in one session that blocks our

FLUSH TABLE

 command:

mysql> select sleep(1) from goods limit 1000, 300;

And let’s run

FLUSH TABLE

 in a loop:

sveta@Thinkie:~/build/ps-5.7/mysql-test$ until (`mysqlmtr -P13001 -e "set lock_wait_timeout=1; flush table goods;" test`); do sleep 1; done
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction
...

Now let’s ensure we can access the table:

mysql> select * from goods order by id limit 10;
^C

We cannot! We cannot even connect to the database where the table is stored:

sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysqlmtr -P13001 test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
^C

The reason for this is that while the 

FLUSH TABLE

 command was killed due to the metadata lock wait timeout, it also requested table lock for flushing and blocked other incoming queries.

But we can enclose

FLUSH TABLE

 into

LOCK TABLE ... WRITE; ... UNLOCK TABLES;

 operations. In this case, the 

LOCK TABLE

 command gets blocked until all queries release metadata lock on the table. Then it exclusively locks the table,

FLUSH TABLE

 runs and then the script immediately unlocks the table. Since closing the session causes an implicit unlock, I used a PHP one-liner to have everything in a single session:

$ php -r '
> $link = new mysqli("127.0.0.1", "root", "", "test", 13001);
> $link->query("set lock_wait_timeout=1");
> while(!$link->query("lock table goods write")) {sleep(1);}
> $link->query("flush table goods");
> $link->query("unlock tables");'

We can confirm if a parallel session can access the table:

mysql> select * from goods order by id limit 10;
+----+---------+----------------------------------+---------------------+
| id | shop_id |                             name |         create_date |
+----+---------+----------------------------------+---------------------+
|  1 |      58 | 5K0z2sHTgjWKKdryTaniQdZmjGjA9wls | 2015-09-19 00:00:00 |
|  2 |      17 | xNll02kgUTWAFURj6j5lL1zXAubG0THG | 2013-10-19 00:00:00 |
|  3 |      30 | clHX7uQopKmoTtEFH5LYBgQncsxRtTIB | 2017-08-01 00:00:00 |
|  4 |      93 | bAzoQTN98AmFjPOZs7PGfbiGfaf9Ye4b | 2013-02-24 00:00:00 |
|  5 |      20 | rQuTO5GHjP60kDbN6WoPpE2S8TtMbrVL | 2017-08-05 00:00:00 |
|  6 |      37 | WxqxA5tBHxikaKbuvbIF84H9QuaCnqQ3 | 2013-10-18 00:00:00 |
|  7 |      13 | DoYnFpQZSVV8UswBsWklgGBUc8zW9mVW | 2017-02-06 00:00:00 |
|  8 |      81 | dkNxMQyZNZuTrONEX4gxRLa0DOedatIs | 2015-07-05 00:00:00 |
|  9 |      12 | Z0t2uQ9itexpPf01KUpa7qBWlT5fBmXR | 2014-06-25 00:00:00 |
| 10 |      90 | 6urABBQyaUVVyxljvd11D3kUxbdDRPRV | 2013-10-23 00:00:00 |
+----+---------+----------------------------------+---------------------+
10 rows in set (0.00 sec)
mysql> update goods set name='test' where id=100;
Query OK, 1 row affected (0.08 sec)
Rows matched: 1 Changed: 1 Warnings: 0

After the PHP script finishes its job, statistics are corrected:

mysql> show index from goods;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| goods |          0 |  PRIMARY |            1 |          id |         A |     8000000 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            1 |     shop_id |         A |         100 |     NULL |   NULL |      |      BTREE |         |               |
| goods |          1 |  shop_id |            2 | create_date |         A |      169861 |     NULL |   NULL |  YES |      BTREE |         |               |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

Conclusion

We can manually update persistent InnoDB statistics to fix Optimizer plans for our queries, with almost no impact on a live server.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com