Apr
23
2014
--

So you want to be an author…but…

writing_bookEveryone has a novel in them. Or so the saying goes. Friends and colleagues often approach me, often sheepishly, about their desire to write a book, or problems they are having in getting started. Well it’s fantastic news that you want to write a novel! Go for it.

Here are just a few of the questions I’ve been asked. I hope my answers are useful.

1. I have a great book idea but feel it’s been done to death
Most things have been done to death. There is little new under the Sun. There are only so many plots and character types. I’m generalizing, but few ideas are ground-breakingly original. Most are a combination of other ideas assembled in a new way, or from a unique perspective, or with an unexpected twist. Take romances for example, probably the most successful and popular genre ever. There must be hundreds of thousands of books about girl meets boy, girl loses boy, either to find him again or find another, truer love. Tall, dark-haired, emotionally strong, idyllic men feature in most stories, as do plenty of Mr. Darcy’s. There are sweet romances by the dozen, hot steamy affairs, love triangles, unhappy marriages, happy marriages… you name it. If you read this genre, I bet you could name several dozen examples of everything I just listed. So has romance been done to death? Not judging by the thousands of romance books published each year.

Take fantasy: How many books can you name that feature a quest for a powerful magic item, usually one that will save the kingdom or world? Isn’t there always a young farm lad who has a prophesized destiny or secret talent that he learns from an old wizard? Aren’t there always a group of men, elves and dwarves on this quest, and usually one of them is a wizard, one is a knight or paladin and there is some kind of rogue or ninja like character? Sound familiar? Done to death, but extremely popular.

In your own writing, look for ways to make these themes, or tropes, your own. Flip them, modify them, surprise the reader. What if the paladin has fallen from grace? What if the elf finds out that the dwarf killed his brother? What if the magic item is a maguffin, a decoy? Let your imagination run wild – don’t be afraid. Even if it feels cliched and well-worn as you write the first draft, once you get your creative juices flowing you’ll start having all sorts of cool ideas. Try them, run with them. Trust your instincts. Before you know, that quest or romance will be stamped with your own unique ideas and voice. Trust the writing process. I often find that my first drafts lack the depth or originality that I hoped for, but by the time I am ready to rewrite and edit, my head is buzzing with what-if’s, and wouldn’t-it-be-cool-if’s, and the story comes more alive with each draft. I think most authors go through this.

Remember too, that a certain familiarity is what attracts a reader. Why are there so many quest books? Because readers love that plot. Why does the young woman in a romance fall in love with the bad guy against everyone’s advice? Because many readers associate with that issue. Write what people enjoy reading, but make it your own version.

2. How do I start?
Woo, this is a common question and the answer is: anywhere. Trite but true. Writing a novel is an immense and daunting mountain of a task. It’s not surprising that so many budding authors cower at the foot of this obstacle with no idea how to begin. Every journey begins with the first step. Eat an elephant one bite at a time. What these cliches tell us is that any start whatsoever helps us overcome the inertia of our fears. Figure out how you want your story to start and try to write that scene. Don’t worry yet if it is the ideal place to start, or if it even makes sense. Just start writing. OK, what happens next? Then what? Then what? What problem does your protagonist have at the start of your book? Show her trying to deal with that. Perhaps she stumbles. Why? Who or what gets in her way? Who helps her?

Unless you have a firm outline of your story in your head, you just need to start – anywhere – and write whatever comes to you. You might discard these early chapters, but don’t worry about that yet. You have to get your mind into the flow of writing. You have to give it some substance to mull over, some ideas to work with. Trust me, if you just start writing, things will develop. Your mind isn’t used to playing the what-if game yet, so you have to train it. If you find yourself slowing or grinding to a halt, just ask some questions: What would she do next? Should she go down into that cellar or call her friend? What if the lights went out? Keep driving forward. Keep writing. Don’t worry about polish, don’t worry about word choice, just let it flow. Get your ideas down. The first draft is a raw dump of ideas – a giant sandbox for you to play with. Don’t fear the lack of direction. Embrace not being tied down.

3. I want to write so badly but don’t know what to say
There is a misconception about writers that they lounge around coffee shops until the muse strikes them and then they bang out a novel non-stop in a weekend. We wish. I’ve drunk way too much Starbucks waiting for my muse! Maybe she’s a tea drinker. First off, you must have some idea of what you want your book to be about, at least the genre. No? Think about the book you’d most love to read. Maybe it’s like that bestseller by ‘blah blah’ that you wish had ended differently. Why do political thrillers always go to the brink of nuclear war and then make peace, when you’d like to know what would have happened if the nukes went flying? Maybe you lament that there are too many vampire books but not enough about unicorns?

The reality is that most muses only help those who help themselves. Consider my advice for #2 above. It applies in this situation too. If you start writing anything at all, you will likely find your muse peering over your shoulder before too long, whispering you ideas. Alas, too many people never write because: “I’ll write when I’m inspired”. Flip that thought. You’ll be inspired when you write. Writing is a proactive creative process – it requires that you take action. Writers write. Writers make things happen. You wouldn’t think of sitting at home every day and waiting for your future spouse just to ring the doorbell one day. Nor would you expect the Lottery folks to just mail you a check out of the blue. You have to put forth effort to reap the rewards. Trust your subconscious. Start writing anything, even if it’s just a story about a cat walking around the garden. Exercise your creative muscles and then ideas will flow – probably faster than you can get them down!

4. I don’t understand all this publishing jargon, self publishing and formatting, so I’m scared to start writing
Slow down there, Tex! You’re way putting the horse before the cart. That’s like worrying about replacing your tires on the day you buy a brand new car, or that you might burn your bread before you even make the dough. Put those things out of your mind right now. Plenty of time to learn about such things later. Much later. When you get that far, you’ll wonder why you worried because our distant fears are always more menacing than the reality.

Trust that the writing process works. It has done for generations. Concentrate on writing the book. That’s more than enough to occupy your mind for a while, trust me. Before you finish that first draft, you’ll have gained (one way or another) the knowledge of how to revise and edit it. Long before you grow tired of editing it, you will figure out what publishing route works for you and start to acquire contacts, critique-partners, editors, agents, cover designers, and what have you. But right now, forget all that. None of that matters until you write the best book you can. Don’t rush to get to those later stages. All in good time. Right now, simply concentrate on writing your story.

5. I keep getting stuck when my writing goes wrong and I have to start over
This is usually because you are overthinking your first draft as you write it. It’s very tempting to read over your last page or chapter and wrinkle your nose in disgust. What a pile of poo. Now you feel compelled to go back and fix it, edit it, polish it, change the dialog, etc. The trouble is that now you’ve taken yourself out of the flow of writing and put yourself into editing mode, and it’s too soon for that when you are writing your first draft. Now you’re going to be nervous to continue, because you’re afraid to write more drivel like the chapter you just spent days cleaning up.

Another possibility is that you write yourself into a corner where your plot goes wrong, or your character does something you didn’t plan on, or you just don’t know what happens next, or you changed your mind and have a much better idea than the one you spent hours or days writing. So you go back and rewrite it “the proper way”, fixing your problems. Great! Except that you write a bit further and it happens again. So you go back once more and change it. I’ve known writers spend months and months rewriting the first 40 pages over and over until they get frustrated with the whole writing business. Please don’t let that happen to you!

Here’s the thing… you need to accept that your first draft will be junk. Go on, say it. Accept it. Believe it. You’ll have to one day, so do yourself a favor and accept it now. Almost every successful author will admit that their first drafts are junk. It’s part of the process. You can’t write a polished story out the gate. The purpose of the first draft is to blast down all those wonderful ideas in your head, to lay down the foundation of the scenes, roughly in the right order, with the right characters and getting as much of the plot and dialogue down as you can. It’s a framework. A starting point. Here’s another truth: You will make mistakes. You will write yourself into a corner. You will realize huge holes in your plot. You will write wooden characters, cliched dialogue, use horrible adverbs, write verbose and passive statements.

You have permission to do all of that on your first draft, because it doesn’t matter. No, really, it doesn’t. Editing and rewriting is where the real magic happens, and you can’t reach that stage until you have your story down. All of it down. As best you can. So now you understand why you must not start over on the first draft, just keep going forward. Make notes about things to rewrite, things that are broken, but don’t fix them yet. If you can train yourself to write your first draft in this way, you won’t start over and you won’t get stuck.

6. How do I find time to write? I’m so busy
Some people are lucky enough to be able to write all day, or for hours at a time. From the question, I’m assuming you’re not one of those people. Many new authors are not either. We all have families, day jobs and responsibilities. Writing falls low on the totem pole of things to get done each precious day. But you can write a novel in 30 minutes a day, even 10 minutes a day. Many writers rarely get down more than 500 words a day, but it all adds up. I’ve heard of bestselling authors who write on a bench watching their kid at soccer practice, or while their kids are doing homework. One enterprising guy wrote an entire novel on the subway to and from work. Entirely on his cellphone!

Don’t make the mistake of waiting until “one day” when you have hours to indulge on your novel. That time may never come. I bet you make time for your favorite TV show, or for that cup of Joe at Starbucks, or to walk the dog. So too can you make time for your writing. You have to make it a priority. Squeeze in time where you can, or cut out something you can do without. This may mean making a pact with your family, like “8pm to 9pm is daddy’s writing time. You can have my attention all day except this hour.” These schemes might not be ideal, but they’re infinitely better than the alternative of not writing at all. No one is busy 24 hours a day. Good luck!
 

If you have other questions or want further advice or tips, doesn’t hesitate to contact me. Ask away! I don’t bite.

 

 

Aug
29
2012
--

Here’s a quick way to Foresee if Replication Slave is ever going to catch up and When!

If you ever had a replication slave that is severely behind, you probably noticed that it’s not catching up with a busy master at a steady pace. Instead, the “Seconds behind master” is going up and down so you can’t really tell whether the replica is catching up or not by looking at just few samples, unless these are spread apart. And even then you can’t tell at a glance when it is going to catch up.

Normally, the “severely behind” thing should not happen, but it does often happen in our consulting practice:

  • sometimes replication would break and then it needs to catch up after it is fixed,
  • other times new replication slave is built from a backup which is normally hours behind,
  • or, it could be that replication slave became too slow to catch up due to missing index

Whatever the case is, single question I am being asked by the customer every time this happens is this: When is the replica going to catch up?”

I used to tell them “I don’t know, it depends..” and indeed it is not an easy question to answer. There are few reasons catching up is so unstable:

  1. If you have restarted the server, or started a new one, caches are cold and there’s a lot of IO happening,
  2. Not all queries are created equal – some would run for seconds, while others can be instant,
  3. Batch jobs: some sites would run nightly tasks like building statistics tables or table checksum – these are usually very intense and cause slave to backup slightly.

I didn’t like my own answer to The question, so I decided to do something about it. And because I love awk, I did that something in awk:

delay=60
cmd="mysql -e 'show slave status\G' | grep Seconds_Behind_Master | awk '{print \$2}'"
while sleep $delay; do
  eval $cmd
done | awk -v delay=$delay '
{
   passed += delay;
   if (count%10==0)   
      printf("s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h\n");
   if (prev==NULL){
      prev = $1;
      start = $1;
   }
   speed = (prev-$1)/delay;
   o_speed = (start-($1-passed))/passed
   if (speed == 0)    speed_d = 1;
     else             speed_d = speed;
   eta = $1/speed_d;
   if (eta<0)         eta = -86400;
   o_eta = $1/o_speed;
   printf("%8d %8.6f %9.3f %7.3f | %9.3f %7.3f %7.2f\n",
      $1, $1/86400, speed, eta/86400, o_speed, o_eta/86400, o_eta/3600);
   prev=$1;
   count++;
}'

I don't know if this is ever going to become a part of a Percona Toolkit, however since it's pretty much a one-liner, I just keep it in my snippets pool for easy copy'n'paste.

Here's a piece of an output from a server that was almost 27 days behind just yesterday:

// at the beginning:
s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h
 2309941 26.735428     0.000  26.735 |     1.000  26.735  641.65
 2309764 26.733380     2.950   9.062 |     2.475  10.801  259.23
 2308946 26.723912    13.633   1.960 |     6.528   4.094   98.25
 2308962 26.724097    -0.267  -1.000 |     5.079   5.262  126.28
 2309022 26.724792    -1.000  -1.000 |     4.063   6.577  157.85
...
// after one hour:
s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h
 2264490 26.209375    39.033   0.671 |    13.418   1.953   46.88
 2262422 26.185440    34.467   0.760 |    13.774   1.901   45.63
 2261702 26.177106    12.000   2.181 |    13.762   1.902   45.65
...
// after three hours:
s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h
 2179124 25.221343    13.383   1.885 |    13.046   1.933   46.40
 2178937 25.219178     3.117   8.092 |    12.997   1.940   46.57
 2178472 25.213796     7.750   3.253 |    12.973   1.943   46.64
...
// after 12 hours:
s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h
 1824590 21.117940    20.233   1.044 |    12.219   1.728   41.48
 1823867 21.109572    12.050   1.752 |    12.221   1.727   41.46
 1823089 21.100567    12.967   1.627 |    12.223   1.726   41.43
...
// after 21 hours:
s_behind  d_behind   c_sec_s   eta_d | O_c_sec_s O_eta_d O_eta_h
 1501659 17.380312    -0.533  -1.000 |    11.768   1.477   35.44
 1501664 17.380370    -0.083  -1.000 |    11.760   1.478   35.47
 1501689 17.380660    -0.417  -1.000 |    11.751   1.479   35.50
...

Of course, it is still not perfectly accurate and it does not account for any potential changes in queries, workload, warm-up, nor the time it takes to run the mysql cli, but it does give you an idea and direction that replication slave is going. Note, negative values mean replication isn't catching up, but values themselves are mostly meaningless.

Here's what the weird acronyms stand for:

  • s_behind - current Seconds_Behind_Master value
  • d_behind - number of days behind based on current s_behind
  • c_sec_s - how many seconds per second were caught up during last interval
  • eta_d - this is ETA based on last interval
  • O_c_sec_s - overall catch-up speed in seconds per second
  • O_eta_d - ETA based on overall catch-up speed (in days)
  • O_eta_h - same like previous but in hours

Let me know if you ever find this useful.

Aug
28
2012
--

Sell-an-Elephant-to-your-Boss-HOWTO

Spoiler alert: If your boss does not need an elephant, he is definitely NOT going to buy one from you. If he will, he will regret it and eventually you will too.

I must appologize to the reader who was expecting to find an advice on selling useless goods to his boss. While I do use a similar technique to get a quarterly raise (no, I don’t), this article is actually about convincing your team, your manager or anyone else who has influence over project’s priorities, that pending system performance optimizations are a priority (assuming, they indeed are). However this headline was not very catchy and way too long, so I decided to go with the elephant instead.

System performance optimization is what I do day to day here at Percona. Looking back at the duration of an optimization project, I find that with bigger companies (bigger here means it’s not a one-man show) it’s not the identification of performance problems that takes most of the time. Nor it is looking for the right solution. Biggest bottleneck in the optimization project is where solution gets approved and prioritized appropriately inside the company that came for performance optimization in the first place. Sometimes I would follow-up with the customer after a few weeks or a month just to find that nothing was done to implement suggested changes. When I would ask why, most of the time the answer is someting along those lines: my manager didn’t schedule/approve it yet.

I don’t want to say that all performance improvements are a priority and should be done right away, not at all. I want to suggest that you can check if optimizations at hand should be prioritized and if so – how to make it happen if you’re not the one who sets priorities.

1. Estimate harm being done

This is the hardest and the most important part of this. The Question that you have to answer is this: How much does it cost your company NOT to have the optimization in place?

This should be expressed in dollars per amount of time (e.g. month). Normally, Percona consultant would tell you how much faster is something going to be when changes are applied. When we can measure it- it will be an accurate number, othertimes – we estimate it (e.g. when buying new hardware or making significant changes to database architecture is needed). What Percona can’t do though is map that to dollars for your company so this is something you will have to work out on your own. These questions might help you:

  • How many users are we loosing [per month] because they.. (“get very unstable response times”, “get timeout for every 10th request”, “need to wait 10s for page to load”) ? How much loosing one customer cost us? Then, multiply.
  • How much more efficient X dept. could be if they got this report in 5min rather than 4h. Would that give any $$ value to the company? How much per month?
  • How many extra servers do we need to run because our systems are not optimized? How much does that cost our company every month?
  • How many conversions are we loosing because users turn away due to slow service during registration?

It may be hard to get some of these answered if the company is not very transparent and you may have to go asking around, sometimes guess. That’s fine though, just keep a record of the guess you made so you can recalculate the whole thing easily once you have the number secured. In other cases there’s no way to get a good enough guess as the data is just way too far from you. In that case you either have to make a very rough guess or accept that there’s nothing you can do about it and that you don’t know how important for the company is the work that you do.

By the way, if someone says users don’t care if the website loads in 2s or in 5s, let him read this.

In the end you have to come up with one number. That number will vary greatly depending on many things like size of the company, number of servers system runs on, number of users, importance of the system we are working on in the global picture and what not.

2. Estimate the cost of the solution

Now comes the question of how much does it cost to implement given optimization?

In some cases it’s just a matter of opportunity costs: how much more valuable is it for the company that you work on implementing performance improvement -vs- that other thing? In other cases, you have to buy extra hardware and then see when (and if) the optimization is going to pay off.

When you have the two numbers, you can clearly see what is more beneficial to the company – get the performance improvements implemented now or leave things as they are and just work on the next thing. Few examples I just came up with:

(a) Company “Find that thing” runs a specialized search service with 25k new users monthly. 20% of users cancel their subscription after first month because sometimes the search is so painfully slow that the requests would time out. Because of that, company is loosing $60k/month. Search stability can be improved to all 6 sigmas if you just had 2 extra servers for search, but that is $20k extra for the servers and $1k/month for colocation. Well, that means the company is going to loosing ~$39k this month if nothing gets done and $59k or more month after month until it gets fixed.

(b) Company “Ana Lysis” sells data analysis services. Each customer report takes approx. 2h to generate, so one 24-disk server can only run ~300 reports a day and needs to queue them so no more than 24 such requests run in parallel. Since one customer can run 5 reports in parallel, with no overbooking they can only host 60 users on a single such server and still due to queueing response time would be unstable during peak times. For every 60 new customers, “Ana Lysis” needs to buy a new 24-disk machine which costs $25k. At the moment there are 6 such machines and a plan to buy 4 new ones next month. By moving to different storage engine, average report generation time would drop to 10 minutes hence one server would now be able to serve 12 times more customers all other things being equal, so not only the new hardware won’t be needed but the current one will be underloaded as well. You will save your company at least $100k. And if you know how many new customers come in every month, you can also get the $/month figure.

3. Make it a short and clear statement

Now that you know the most important numbers, go to your manager or whoever sets project priorities and make it clear:

“Our company is wasting $59k every month and I know exactly how to fix this.”

While you may sound funny at first if it’s the first time you talk about money other than your salary, it should still get you the attention – business is all about making money and no business likes to be wasting money. Most managers know this and they are normally welcome to save money. However, beware that different orders of magnitude are relevant for different businesses, so don’t be surprised if it appears that you bring in more than $59k a month to the company by working on that other thing.

Assuming manager happens to be worried about the $59k/month loss, now it is your turn to show exactly how is it wasting that much money and what to do about it.

4. Show the method, 5. the main problem and 6. the solution

Having the attention, I would try to be very thorough here and explain everthing from top to bottom starting with how it occurred that there is a problem at all, how the performance review was done, what were the findings, what’s the impact of other performance problems found and then most importantly, how you came up with the $59/month figure and what were your assumptions and guesses. Of course, don’t make it a 3h explanation – if you can run through this within 15-20 minutes, that’s perfect.

Once you’re clear on that, it’s time to present the solution, why you think it makes sense and how much you think it will cost – either in your hours or in company’s dollars.

7. Overcome any obsticles

At this point it is quite likely that your manager will come up with a few or lots of reasonable arguments against the problem or the solution. Don’t get discouraged – most of the time you have already thought about it and found a solution, you will just have to do it again if you didn’t take notes. On the other hand, managers often see broader picture and they know more than you think they do, therefore it is very important that you (either on your own or together with manager) actually acknowledge and remove every possible obsticle or it will be a show-stopper.

8. Kick it off

If you were able to overcome all the obsticles and unless the problem is fixed, the net result is still a significant waste of money, it’s time to go ahead and get that thing fixed.

Summary

I know this all sounds like a lot of work and indeed it is, however it is very rewarding as you focus on what is the most important thing for virtually any business: making (or saving) money.

Also note that Percona can do a lot of heavy lifting for you. The method that we use – Goal driven performance optimization – is based on the same value mindset: you tell us what would give you the most value and we will help you get there, while you’re busy rolling out that new feature or launching the new project.

Of course, there are certain things we can’t do, like re-architecting application, but often the question is moving to better hardware, optimizing indexes or queries, tuning MySQL etc. which we can offload 80-90% from the team. On the other hand, if the application is so bad it needs to be re-desgined, we are going to let you know.

Apr
07
2011
--

Optimizing slow web pages with mk-query-digest

I don’t use many tools in my consulting practice but for the ones I do, I try to know them as best as I can. I’ve been using mk-query-digest for almost as long as it exists but it continues to surprise me in ways I couldn’t imagine it would. This time I’d like to share a quick tip on how mk-query-digest allows you to slice your data in a completely different way than it otherwise would by default.

Disclaimer: this only works when persistent connections or connection pools aren’t used and is only accurate when single mysql connection is used during execution of a request.

If you are seeking to reduce the load on the database server and [as a result] increase response time for some random user requests, you are usually interested in queries that are consuming most MySQL time and that’s how mk-query-digest groups and orders data by default. Fixing top 10 queries on the list indeed will most likely reduce the load and improve response time for some requests. What if some pages are still slow to load because of the time spent in database and you either can’t or don’t want to profile or debug the application to figure out what’s happening under the hood?

That sounds like something I was working on today – I had a slow query log (captured with long_query_time=0 and all the eXtra benefits from Percona slow query log patch), I knew some particular pages were taking minutes to load and that’s exactly what the customer asked me to focus on. So instead of using mk-query-digest to list me top slowest queries, I asked it to list me top slowest sessions:

mk-query-digest --group-by=Thread_id --order-by=Query_time:sum in > out

Spot on, the session I needed to focus on was right at the top. And what do you know, 519 queries were run during that session which took 148s seconds overall:

# ########################################################################
# Report grouped by Thread_id
# ########################################################################

# Item 1: 3.41 QPS, 0.97x concurrency, ID 0xABCE5AD2A2DD1BA1 at byte 288124661
# This item is included in the report because it matches --limit.
# Scores: Apdex = 0.97 [1.0], V/M = 19.02
# Query_time sparkline: | ^______|
# Time range: 2011-04-05 16:12:13 to 16:14:45
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          0     519
# Exec time      2    148s    11us     33s   285ms    53ms      2s    26us
# Lock time      0     5ms       0   334us     9us    66us    32us       0
# Rows sent      0      41       0       1    0.08    0.99    0.27       0
# Rows examine   1   4.97M       0 445.49k   9.80k   5.73k  49.33k       0
# Rows affecte   0       2       0       1    0.00       0    0.06       0
# Rows read      1   2.01M       0 250.47k   3.96k    1.96  27.94k    0.99
# Bytes sent     0 241.20k      11   8.01k  475.89  918.49  689.98  258.32
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables     0      15       0       1    0.03       0    0.17       0
# Tmp disk tbl   0       3       0       1    0.01       0    0.08       0
# Tmp tbl size   0   4.78k       0   4.78k    9.43       0  211.60       0
# Query size     0 100.95k      19   2.71k  199.17  363.48  206.60  151.03
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin   1  67.99k       0  10.64k   1.26k   3.88k   2.47k   31.70
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Filesort       0% yes,  99% no
# Full scan      7% yes,  92% no
# QC Hit        78% yes,  21% no
# Tmp table      2% yes,  97% no
# Tmp table on   0% yes,  99% no
# String:
# Databases    prod_db
# Hosts        localhost
# InnoDB trxID 1153145C (2/0%), 11531626 (2/0%)... 43 more
# Last errno   0
# Users        prod
# Query_time distribution
#   1us
#  10us  ################################################################
# 100us  #########
#   1ms  #
#  10ms  #
# 100ms  #
#    1s  ###
#  10s+  #
160847

The stats here are aggregated per all queries which is great, but I still need to figure out what queries were run. I could use mk-log-player and split all sessions that way, unfortunately mk-log-player will not have all the other useful information, not even query timing. Instead, I’ve used mk-query-digest:

mk-query-digest --filter='$event->{Thread_id} == 160847' in > out

Now I know exactly what needs to be fixed first to make the greatest impact to this page response time. I can also convert that into a slow query log that lists all the queries that were executed during this session in the order they were executed:

mk-query-digest --filter='$event->{Thread_id} == 160847' --no-report --print in > out

Pretty cool, isn’t it? Sure, it would be even better if mk-query-digest would do a nested group-by and order-by within a group so I would avoid the extra step, but then even better than that would be if it would optimize the queries all together! Unfortunately mk-query-digest won’t do that for you, but then there’s mk-query-advisor ;)

Jan
10
2010
--

Active Cache for MySQL

One of the problems I have with Memcache is this cache is passive, this means it only stores cached data. This means application using Memcache has to has to special logic to handle misses from the cache, being careful updating the cache – you may have multiple data modifications happening at the same time. Finally you have to pay with increased latency constructing the items expired from the cache, while they could have been refreshed in the background. I think all of these problems could be solved with concept of active cache

The idea with Active Cache is very simple – for any data retrieval operation cache would actually know how to construct the object, so you will never get a miss from the cache, unless there is an error. From existing tools this probably lies out best on registering the jobs with Gearman.

The updates of the data in this case should go through the same system so you can get serialization (or other logic) for your data updates.

You could also use the same functions updating the data when it expires. This could be exposed as explicit logic, something like expires in 300 seconds, start refresh in 200 seconds as well as automated.

The logic for automatic handling could be as follows – after the key has expired we can purge its value but keep it in cache with “expired” flag. If we can see for the same key we get a lot of requests when it is expired cache could decide to refresh such keys based on available bandwidth.

Another extension to common caching methods I’d like to see is having max_age specified on GET request. In many applications expiration is not data driven but rather request driven. Consider for example posting the blog comment on this blog. If you’re the user who posted the comment you have to see it instantly to avoid bad experience. At the same time other users can continue reading stale data – if they see comment appearing 10 seconds later they will not have any bad user experience.

Finally I think Active Cache could be very helpful handling write back scenarios. There are many cases when there is a lot of updates happening to the data – counters, last login, scores etc which do not really need to be reflected in the database instantly. If cache itself “knows” how to update the data you could define the policies on how frequently the data object needs to be synced to database.

I’d like to hear some feedback if you think such concept would be helpful for your applications and if you think there are existing tools and technologies which can be used to conveniently build things like this.


Entry posted by peter |
11 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com