IBM Acquires Cleversafe To Accelerate Hybrid Cloud Storage Strategy

folders over a circuit board. This morning IBM announced plans to acquire Cleversafe, a Chicago-based data storage vendor. The move appears to be a way to sure up IBM’s hybrid cloud storage chops. Terms of the deal were not available. Cleversafe was founded in 2004, and was well ahead of its time when it came to anticipating the big data revolution. According to its website, the company foresaw the coming data… Read More


Training my Dragon


Hello readers. This post is going to be a little different, and I ask you in advance to please ignore all typos and mistakes. You'll get to see why in a moment.

a couple of years ago, I experimented with dictation for writing my books, instead of typing. I did so for about a month, and then return to the keyboard having found the dictation was too slow. Recently I have a chance to talk to other authors who were using dictation, and I learned from them that I should have persevered more to perfect the art of dictating the book. So over the last few weeks I've been trying again.

At first, like before, I found dictation frustratingly slow. Not because you have to speak really slowly, in fact you can talk at normal speed, but because it just wasn't natural for me to precinct my sentences and then say them out loud. Now, after a few weeks, I find the dictation coming more naturally. It does help to preformed each clause in my mind and then speak it, but I no longer have huge gaps between each comma, and I can continue dictating at a reasonable speed. Already I've noticed a higher word count per hour dictating that I achieve typing, and I'm not a slouch at typing either. Another discovery I made while dictating was that my back muscles became less tense, probably because I wasn't hunched over the keyboard. Now I can sit back comfortably, with one eye on the screen,. You might want to consider this if you suffer from carpal tunnel or other risk maladies.

Write about this point, you've probably realized that I am dictating this post. You can see some typos. This brings me to another point: dictation is best used for first draft, when you want to just keep going and not go back and correct mistakes. The first draft should be all about getting the story down, and you will edit it into shape later. So at this point you want worry about typos. Another relevant than point is that dictation breaks down easily if you're writing involves a lot of jargon, like sci-fi names, fantasy names, made up words etc. again, these can easily be fixed in the second, edited, draft. To stop myself going insane, I usually replace my fantasy character names with a name that the dictation software easily understands, like Thomas. It's just a simple find and replace later.

As you can see, dictation works remarkably well. I promised I haven't done any edits on this text whatsoever. This is exactly what is coming out of me speaking into the microphone, and speaking at a regular, fast, pace. For a quiet night touch typist like myself, (can you guess what I really said, instead of quiet night?) Dictation often have less typos than when I type.

There are two things you need to do before you can get stuck into dictation. First you need to train your software. You'll see this commonly referred to as training your Dragon, primarily because the leading dictation software is Dragon naturally speaking, by Nuance. Dragon software comes with training programs, Liddell versus and exercises that you can speak so that Dragon can learn the peculiarities of your voice. It's a painless process. Don't skip this step, because the better train your Dragon is, the better the results you get. As you can see, my dragon is trained to more than 90% effectiveness.

The second thing you need to train is yourself. Take the time to persevere. It will be slow and frustrating at first. Don't give up like I did the first time I tried this. Except that you will be slow in your writing, and make more mistakes, and so you begin to forge new pathways in your brain between your thought processes and your speech. Then you will achieve a much more natural flow between thinking up ideas and getting them onto the page by dictation.

There's another advantage you have with dictation: you don't have to be in your computer. Dragon has a smart phone app that allows you to dictate on the go, perhaps during your commute or on our walk, and then you can sync that digital file to your computer later, and it will create all the text at once. And as you can see, you can do a lot more things but dictation than just write your book, such as writing blog posts, or emails.

His a few techie details for those that are interested. I'm using Dragon three for the Mac, which is actually a really old version of Dragon, but it still works perfectly well, even on the brand-new El Capitan OS X for the Mac. Of course there are Windows versions too. Many authors have bought expensive microphones, claiming, and rightfully so, that better mics lead to better results. I just use the free USB headset that came with Dragon, and as you can see it does the job extremely well.

If you are interested in dictation, feel free to email me, drop me a comment below, and I can recommend a really neat Facebook group, and you can also find several books about dictating using Dragon on Amazon. I'll drop out of dictation mode in a moment and provide a link to one such book. Dictate your book, by Monica Leonelle.

I hope this post inspires you to try dictation. I promise that I have done no editing on this post whatsoever, it's entirely what came out from my dictation. Good luck, and have fun!


Turing Email Makes Checking Your Inbox As Fun As Using Slack

featuredimage The team at Turing Email is setting out to fix the little things that make email so tedious. Although many solutions to replace email for internal communications are growing in popularity today, Turing Email takes a step back to the basics and focuses on the little inconveniences that make email such a groan-inducing part of the workday. Stewart Macgregor-Dennis, the company’s founder… Read More


Cooperation Is The New Normal At Microsoft

satya-nadella When Microsoft settled its patent lawsuits with Google yesterday, it ended a long battle with the search giant, but it wasn’t an isolated action. It was part of a continuing pattern of serving its customers instead of its corporate ego. Whether it’s evolving agreements with Apple, Salesforce or Box or putting an end to senseless lawsuits, Microsoft is attempting to… Read More


Capture database traffic using the Performance Schema

Capturing data is a critical part of performing a query analysis, or even just to have an idea of what’s going on inside the database.

There are several known ways to achieve this. For example:

  • Enable the General Log
  • Use the Slow Log with long_query_time = 0
  • Capture packets that go to MySQL from the network stream using TCPDUMP 
  • Use the pt-query-digest with the –processlist parameter

However, these methods can add significant overhead and might even have negative performance consequences, such as:

Now, sometimes you just need to sneak a peek at the traffic. Nothing fancy. In that case, probably the faster and easiest way to gather some traffic data is to use pt-query-digest with the –processlist. It doesn’t require any change in the server’s configuration nor critical handling of files. It doesn’t even require access to the server, just a user with the proper permissions to run “show full processlist”. But, and this is a significantly big “but,” you have to take into account that polling the SHOW PROCESSLIST command misses quite a number of queries and gives very poor timing resolution, among other things (like the processlist Mutex).

What’s the alternative? Use the Performance Schema. Specifically: The events_statements_* and threads tables.

First, we have to make sure that we have the correspondent consumers enabled:

mysql> select * from setup_consumers where name like 'events%statement%' and enabled = 'yes';
| NAME                           | ENABLED |
| events_statements_current      | YES     |
| events_statements_history      | YES     |
| events_statements_history_long | YES     |
3 rows in set (0.00 sec)

Additionally, for statistics to be collected for statements, it is not sufficient to enable only the final statement/sql/* instruments used for individual statement types. The abstract statement/abstract/* instruments must be enabled as well. This should not normally be an issue because all statement instruments are enabled by default.

If you can’t see the event_statements_* consumers on your setup_consumers tables, you’re probably running a MySQL version prior to 5.6.3. Before that version, the events_statements_* tables didn’t exists. MySQL 5.6 might not be more widely used, as was already pointed out in this same blog.

Before continuing, it’s important to note that the most important condition at the moment of capture data is that:

The statements must have ended.

If the statement is still being executed, it can’t be part of the collected traffic. For the ones out there that want to know what’s running inside MySQL, there’s already a detailed non-blocking processlist view to replace [INFORMATION_SCHEMA. | SHOW FULL] PROCESSLIST available with Sys Schema (that will come as default in MySQL 5.7).

Our options to capture data are: get it from one of the three available tables: events_statements_current, events_statements_history or events_statements_history_long.

First option: use the events_statements_current table, which contains current statement events. Since we only want to get statements that have ended, the query will need to add the condition END_EVENT_ID IS NOT NULL to the query. This column is set to NULL when the event starts and updated to the thread current event number when the event ends, but when testing, there were too many missing queries. This is probably because between iterations, the associated threads were removed from the threads table or simply because the time between END_EVENT_ID being updated and the row being removed from the table is too short. This option is discarded.

Second option: The events_statements_history table contains the most recent statement events per thread and since statement events are not added to the events_statements_history table until they have ended, using this table will do the trick without additional conditions in order to know if the event is still running or not. Also, as new events are added, older events are discarded if the table is full.

That means that this table size is fixed. You can change the table size by modifying the variable performance_schema_events_statements_history_size. In the server version I used (5.6.25-73.1-log Percona Server (GPL), Release 73.1, Revision 07b797f) the table size is, by default, defined as autosized (-1) and can have 10 rows per thread. For example: if you are running 5 threads, the table will have 50 rows.

Since it is a fixed size, chances are that some events might be lost between iterations.

Third option: The events_statements_history_long table, which is kind of an extended version of events_statements_history table. Depending on the MySQL version, by default it can hold up to 10000 rows or be autosized (also modifiable with the variable performance_schema_events_statements_history_long_size)

One major -and not cool at all- drawback for this table is that “When a thread ends, its rows are removed from the table”. So it is not history-history data. It will go as far as the oldest thread, with the older event still alive.

The logical option to choose would be the third one: use the events_statements_history_long table. I’ve created a small script (available here) to collect infinite iterations on all the events per thread between a range of event_id’s. The idea of the range is to avoid capturing the same event more than once. Turns out that the execute a query against this table is pretty slow, something between 0.53 seconds and 1.96 seconds. It can behave in a quite invasive way.

Which leave us with the second option: The events_statements_history table.

Since the goal is to capture data in a slow log format manner,  additional information needs to be obtained from the threads table, which has a row for each server thread. The most important thing to remember: access to threads does not require a mutex and has minimal impact on server performance.

Combined, these two tables give us enough information to simulate a very comprehensive slow log format. We just need the proper query:

'','# Time: ', date_format(CURDATE(),'%y%m%d'),' ',TIME_FORMAT(NOW(6),'%H:%i:%s.%f'),'n'
,'# Schema: ',CURRENT_SCHEMA,'  Last_errno: ',MYSQL_ERRNO,'  ','n'
,'# Query_time: ',ROUND(s.TIMER_WAIT / 1000000000000, 6),' Lock_time: ',ROUND(s.LOCK_TIME / 1000000000000, 6),'  Rows_sent: ',ROWS_SENT,'  Rows_examined: ',ROWS_EXAMINED,'  Rows_affected: ',ROWS_AFFECTED,'n'
,'# Tmp_tables: ',CREATED_TMP_TABLES,'  Tmp_disk_tables: ',CREATED_TMP_DISK_TABLES,'  ','n'
,'# Full_scan: ',IF(SELECT_SCAN=0,'No','Yes'),'  Full_join: ',IF(SELECT_FULL_JOIN=0,'No','Yes'),'  Tmp_table: ',IF(CREATED_TMP_TABLES=0,'No','Yes'),'  Tmp_table_on_disk: ',IF(CREATED_TMP_DISK_TABLES=0,'No','Yes'),'n'
FROM performance_schema.events_statements_history s
JOIN performance_schema.threads t using(thread_id)
AND t.PROCESSLIST_ID != connection_id()

The idea of this query is to get a Slow Log format as close as possible to the one that can be obtained by using all the options from the log_slow_filter variable.

The other conditions are:

  • t.TYPE = ‘FOREGROUND’: The threads table provides information about background threads, which we don’t intend to analyze. User connection threads are foreground threads.
  • t.PROCESSLIST_INFO IS NOT NULL: This field is NULL if the thread is not executing any statement.
  • t.PROCESSLIST_ID != connection_id(): Ignore me (this query).

The output of the query will look like a proper Slow Log output:

# Time: 150928 18:13:59.364770
# User@Host: root[root] @ localhost []  Id: 58918
# Schema: test  Last_errno: 0
# Query_time: 0.000112 Lock_time: 0.000031  Rows_sent: 1  Rows_examined: 1  Rows_affected: 0
# Tmp_tables: 0  Tmp_disk_tables: 0
# Full_scan: No  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
INSERT INTO sbtest1 (id, k, c, pad) VALUES (498331, 500002, '94319277193-32425777628-16873832222-63349719430-81491567472-95609279824-62816435936-35587466264-28928538387-05758919296'
, '21087155048-49626128242-69710162312-37985583633-69136889432');

And this file can be used with pt-query-digest to aggregate similar queries, just as it was a regular slow log output. I ran a small test which consists of:

  • Generate traffic using sysbench. This is the sysbench command used:
    sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=localhost --mysql-port=3306 --mysql-user=root --mysql-password= --mysql-db=test --mysql-table-engine=innodb --oltp-test-mode=complex --oltp-read-only=off --oltp-reconnect=on --oltp-table-size=1000000 --max-requests=100000000 --num-threads=4 --report-interval=1 --report-checkpoints=10 --tx-rate=0 run
  • Capture the data using slow log + long_query_time = 0
  • Capture data using pt-query-digest –processlist
  • Capture data from Performance Schema
  • Run pt-query-digest on the 3 files

The results were:

– Slow Log:

# Profile
# Rank Query ID           Response time Calls  R/Call V/M   Item
# ==== ================== ============= ====== ====== ===== ==============
#    1 0x813031B8BBC3B329 47.7743 18.4%  15319 0.0031  0.01 COMMIT
#    2 0x737F39F04B198EF6 39.4276 15.2%  15320 0.0026  0.00 SELECT sbtest?
#    3 0x558CAEF5F387E929 37.8536 14.6% 153220 0.0002  0.00 SELECT sbtest?
#    4 0x84D1DEE77FA8D4C3 30.1610 11.6%  15321 0.0020  0.00 SELECT sbtest?
#    5 0x6EEB1BFDCCF4EBCD 24.4468  9.4%  15322 0.0016  0.00 SELECT sbtest?
#    6 0x3821AE1F716D5205 22.4813  8.7%  15322 0.0015  0.00 SELECT sbtest?
#    7 0x9270EE4497475EB8 18.9363  7.3%   3021 0.0063  0.00 SELECT performance_schema.events_statements_history performance_schema.threads
#    8 0xD30AD7E3079ABCE7 12.8770  5.0%  15320 0.0008  0.01 UPDATE sbtest?
#    9 0xE96B374065B13356  8.4475  3.3%  15319 0.0006  0.00 UPDATE sbtest?
#   10 0xEAB8A8A8BEEFF705  8.0984  3.1%  15319 0.0005  0.00 DELETE sbtest?
# MISC 0xMISC              8.5077  3.3%  42229 0.0002   0.0 <10 ITEMS>

– pt-query-digest –processlist

# Profile
# Rank Query ID           Response time Calls R/Call V/M   Item
# ==== ================== ============= ===== ====== ===== ===============
#    1 0x737F39F04B198EF6 53.4780 16.7%  3676 0.0145  0.20 SELECT sbtest?
#    2 0x813031B8BBC3B329 50.7843 15.9%  3577 0.0142  0.10 COMMIT
#    3 0x558CAEF5F387E929 50.7241 15.8%  4024 0.0126  0.08 SELECT sbtest?
#    4 0x84D1DEE77FA8D4C3 35.8314 11.2%  2753 0.0130  0.11 SELECT sbtest?
#    5 0x6EEB1BFDCCF4EBCD 32.3391 10.1%  2196 0.0147  0.21 SELECT sbtest?
#    6 0x3821AE1F716D5205 28.1566  8.8%  2013 0.0140  0.17 SELECT sbtest?
#    7 0x9270EE4497475EB8 22.1537  6.9%  1381 0.0160  0.22 SELECT performance_schema.events_statements_history performance_schema.threads
#    8 0xD30AD7E3079ABCE7 15.4540  4.8%  1303 0.0119  0.00 UPDATE sbtest?
#    9 0xE96B374065B13356 11.3250  3.5%   885 0.0128  0.09 UPDATE sbtest?
#   10 0xEAB8A8A8BEEFF705 10.2592  3.2%   792 0.0130  0.09 DELETE sbtest?
# MISC 0xMISC              9.7642  3.0%   821 0.0119   0.0 <3 ITEMS>

– Performance Schema

# Profile
# Rank Query ID           Response time Calls R/Call V/M   Item
# ==== ================== ============= ===== ====== ===== ==============
#    1 0x813031B8BBC3B329 14.6698 24.8% 12380 0.0012  0.00 COMMIT
#    2 0x558CAEF5F387E929 12.0447 20.4% 10280 0.0012  0.00 SELECT sbtest?
#    3 0x737F39F04B198EF6  7.9803 13.5% 10280 0.0008  0.00 SELECT sbtest?
#    4 0x3821AE1F716D5205  4.6945  7.9%  5520 0.0009  0.00 SELECT sbtest?
#    5 0x84D1DEE77FA8D4C3  4.6906  7.9%  7350 0.0006  0.00 SELECT sbtest?
#    6 0x6EEB1BFDCCF4EBCD  4.1018  6.9%  6310 0.0007  0.00 SELECT sbtest?
#    7 0xD30AD7E3079ABCE7  3.7983  6.4%  3710 0.0010  0.00 UPDATE sbtest?
#    8 0xE96B374065B13356  2.3878  4.0%  2460 0.0010  0.00 UPDATE sbtest?
#    9 0xEAB8A8A8BEEFF705  2.2231  3.8%  2220 0.0010  0.00 DELETE sbtest?
# MISC 0xMISC              2.4961  4.2%  2460 0.0010   0.0 <7 ITEMS>

The P_S data is closer to the Slow Log one than the captured with regular SHOW FULL PROCESSLIST, but it is still far from being accurate. Remember that this is an alternative for a fast and easy way to capture traffic without too much trouble, so that’s a trade-off that you might have to accept.

Summary: Capture traffic always comes with a tradeoff, but if you’re willing to sacrifice accuracy it can be done with minimal impact on server performance, using the Performance Schema. Because P_S is enabled by default since MySQL 5.6.6 you might already be living with the overhead (if using 5.6). If you are one of the lucky ones that have P_S on production, don’t be afraid to use it. There’s a lot of data already in there.

The post Capture database traffic using the Performance Schema appeared first on MySQL Performance Blog.


Asana Debuts New Version Of Its Collab Software, Claims 140K Companies Use Its Service

justin-rosenstein-asana This morning at Asana’s San Francisco headquarters, the startup announced a new version of its product, a new brand — logo — and, happily, some metrics about its growth. As you will recall, the startup picked up a new Head of Business from Google on Monday, announcing at the same time that it has expanded its revenue by 230 percent over the past year. Growth Asana announced… Read More


NodePrime Raises $7M In Seed Funding To Help Companies Manage Their Data Centers

servers Managing a complex modern IT infrastructure is hard. NodePrime, a San Francisco-based startup that’s emerging from stealth today, wants to make it easier for companies to map, monitor and control their data centers. The company also today announced that it has raised a $7 million seed round with participation from the likes of Menlo Ventures, NEA, Formation 8, Ericsson, Initialized… Read More


AI Is About To Go Mainstream And Reshape The Workplace

roboarmy Much like the evolution of systems design, IT-enabled process change ebbs and flows over time. We see this in history as each technology revolution brings with it a refactoring of business operations. We’re about to see another “flow” of transformation thanks to resurgent interest in artificial intelligence and the emergence of the AI-powered business application. Read More


Interviewed Raises A $2M Seed Round, Expands Beyond Basic Job Simulations

candidate scorecard Interviewed, a company that provides simulations for job candidates, has closed a $2 million seed round with participation from YC, SV Angel, Red Swan Ventures, and Jason Calacanis, among others.
The company also announced that after six months it have given almost 100,000 job assessments, with that number growing from 240 assessments the first month to 43,500 assessments last month. Read More


GE Predicts Predix Platform Will Generate $6B In Revenue This Year

Wind Turbine Farm Like many big companies, GE has been in the process of trying to reinvent itself, and Predix, its Industrial Internet of Things platform has been a big part of that.
Today, at its annual Minds + Machine conference in San Francisco, GE announced that the Predix platform had grown into a big business with $5 billion in revenues and $6 billion in orders expected this year.
The company also… Read More