Aug
25
2020
--

Microsoft brings transcriptions to Word

Microsoft today launched Transcribe in Word, its new transcription service for Microsoft 365 subscribers, into general availability. It’s now available in the online version of Word, with other platforms launching later. In addition, Word is also getting new dictation features, which now allow you to use your voice to format and edit your text, for example.

As the name implies, this new feature lets you transcribe conversations, both live and pre-recorded, and then edit those transcripts right inside of Word. With this, the company goes head-to-head with startups like Otter and Google’s Recorder app, though they all have their own pros and cons.

Image Credits: Microsoft

To get started with Transcribe in Word, you simply head for the Dictate button in the menu bar and click on “Transcribe.” From there, you can record a conversation as it happens — by recording it directly through a speakerphone and your laptop’s microphone, for example — or by recording it in some other way and then uploading that file. The service accepts .mp3, .wav, .m4a and .mp4 files.

As Dan Parish, Microsoft principal group PM manager for Natural User Interface & Incubation, noted in a press briefing ahead of today’s announcement, when you record a call live, the transcription actually runs in the background while you conduct your interview, for example. The team purposely decided not to show you the live transcript, though, because its user research showed that it was distracting. I admit that I like to see the live transcript in Otter and Recorder, but maybe I’m alone in that.

Like with other services, Transcribe in Word lets you click on individual paragraphs in the transcript and then listen to that at a variety of speeds. Because the automated transcript will inevitably have errors in it, that’s a must-have feature. Sadly, though, Transcribe doesn’t let you click on individual words.

One major limitation of the service right now is that if you like to record offline and then upload your files, you’ll be limited to 300 minutes, without the ability to extend this for an extra fee, for example. I know I often transcribe far more than five hours of interviews in any given month, so that limit seems low, especially given that Otter provides me with 6,000 minutes on its cheapest paid plan. The max length for a transcript on Otter is four hours while Microsoft’s only limit for is a 200MB file upload limit, with no limits on live recordings.

Another issue I noticed here is that if you mistakenly exit the tab with Word in it, the transcription process will stop and there doesn’t seem to be a way to restart it.

It also takes quite a while for the uploaded files to be transcribed. It takes roughly as long as the conversations I’ve tried to transcribe, but the results are very good — and often better than those of competing services. Transcribe for Word also does a nice job separating out the different speakers in a conversation. For privacy reasons, you must assign your own names to those — even when you regularly record the same people.

It’d be nice to get the same feature in something like OneNote, for example, and my guess is Microsoft may expand this to its note-taking app over time. To me, that’s the more natural place for it.

Image Credits: Microsoft

The new dictation features in Word now let you give commands like “bold the last sentence,” for example, and say “percentage sign” or “ampersand” if you need to add those symbols to a text (or “smiley face,” if those are the kinds of texts you write in Word).

Even if you don’t often need to transcribe text, this new feature shows how Microsoft is now using its subscription service to launch new premium features to convert free users to paying ones. I’d be surprised if tools like the Microsoft Editor (which offers more features for paying users), this transcription service, as well as some of the new AI features in the likes of Excel and PowerPoint, didn’t help to convert some users into paying ones, especially now that the company has combined into a single bundle Office 365 and Microsoft 365 for consumers. After all, just a subscription to something like Grammarly and Otter would be significantly more expensive than a Microsoft 365 subscription.

 

Apr
22
2020
--

Medallia acquires voice-to-text specialist Voci Technologies for $59M

M&A has largely slowed down in the current market, but there remain pockets of activity when the timing and price are right. Today, Medallia — a customer experience platform that scans online reviews, social media, and other sources to provide better insights into what a company is doing right and wrong and what needs to get addressed — announced that it would acquire Voci Technologies, a speech-to-text startup, for $59 million in cash.

Medallia plans to integrate the startup’s AI technology so that voice-based interactions — for example from calls into call centers — can be part of the data crunched by its analytics platform. Despite the rise of social media, messaging channels, and (currently) a shift for people to do a lot more online, voice still accounts for the majority of customer interactions for a business, so this is an important area for Medallia to tackle.

“Voci transcribes 100% of live and recorded calls into text that can be analyzed quickly to determine customer satisfaction, adding a powerful set of signals to the Medallia Experience Cloud,” said Leslie Stretch, president and CEO of Medallia, in a statement. “At the same time, Voci enables call analysis moments after each interaction has completed, optimizing every aspect of call center operations securely. Especially important as virtual and remote contact center operations take shape.”

While there are a lot of speech-to-text offerings in the market today, the key with Voci is that it is able to discern a number of other details in the call, including emotion, gender, sentiment, and voice biometric identity. It’s also able to filter out personal identifiable information to ensure more privacy around using the data for further analytics.

Voci started life as a spinout from Carnegie Mellon University (its three founders were all PhDs from the school), and it had raised a total of about $18 million from investors that included Grotech Ventures, Harbert Growth Parnters, and the university itself. It was last valued at $28 million in March 2018 (during a Series B raise), meaning that today’s acquisition was slightly more than double that value.

The company seems to have been on an upswing with its business. Voci has to date processed some 2 billion minutes of speech, and in January, the company published some momentum numbers that said bookings had grown some 63% in the last quarter, boosted by contact center customers.

In addition to contact centers, the company catered to companies in finance, healthcare, insurance and others areas of business process outsourcing, although it does not disclose names. As with all companies and organizations that have products that cater to offering services remotely, Voci has seen stronger demand for its business in recent weeks, at a time when many have curtailed physical contact due to COVID-19-related movement restrictions.

“Our whole company is delighted to be joining forces with experience management leader Medallia. We are thrilled that Voci’s powerful speech to text capabilities will become part of Medallia Experience Cloud,” said Mike Coney, CEO of Voci, in a statement. “The consolidation of all contact center signals with video, survey and other critical feedback is a game changer for the industry.”

It’s not clear whether Voci had been trying to raise money in the last few months, or if this was a proactive approach from Medallia. But more generally, M&A has found itself in a particularly key position in the world of tech: startups are finding it more challenging right now to raise money, and one big question has been whether that will lead to more hail-mary-style M&A plays, as one route for promising businesses and technologies to avoid shutting down altogether.

For its part, Medallia, which went public in July 2019 after raising money from the likes of Sequoia, has seen its stock hit like the rest of the market in recent weeks. Its current market cap is at around $2.8 billion, just $400 million more than its last private valuation.

The deal is expected to close in May 2020, Medallia said.

 

Aug
01
2019
--

Dasha AI is calling so you don’t have to

While you’d be hard-pressed to find any startup not brimming with confidence over the disruptive idea they’re chasing, it’s not often you come across a young company as calmly convinced it’s engineering the future as Dasha AI.

The team is building a platform for designing human-like voice interactions to automate business processes. Put simply, it’s using AI to make machine voices a whole lot less robotic.

“What we definitely know is this will definitely happen,” says CEO and co-founder Vladislav Chernyshov. “Sooner or later the conversational AI/voice AI will replace people everywhere where the technology will allow. And it’s better for us to be the first mover than the last in this field.”

“In 2018 in the U.S. alone there were 30 million people doing some kind of repetitive tasks over the phone. We can automate these jobs now or we are going to be able to automate it in two years,” he goes on. “If you multiple it with Europe and the massive call centers in India, Pakistan and the Philippines you will probably have something like close to 120 million people worldwide… and they are all subject for disruption, potentially.”

The New York-based startup has been operating in relative stealth up to now. But it’s breaking cover to talk to TechCrunch — announcing a $2 million seed round, led by RTP Ventures and RTP Global: An early-stage investor that’s backed the likes of Datadog and RingCentral. RTP’s venture arm, also based in NY, writes on its website that it prefers engineer-founded companies — that “solve big problems with technology.” “We like technology, not gimmicks,” the fund warns with added emphasis.

Dasha’s core tech right now includes what Chernyshov describes as “a human-level, voice-first conversation modelling engine;” a hybrid text-to-speech engine which he says enables it to model speech disfluencies (aka, the ums and ahs, pitch changes etc. that characterize human chatter); plus “a fast and accurate” real-time voice activity detection algorithm which detects speech in less than 100 milliseconds, meaning the AI can turn-take and handle interruptions in the conversation flow. The platform also can detect a caller’s gender — a feature that can be useful for healthcare use cases, for example.

Another component Chernyshov flags is “an end-to-end pipeline for semi-supervised learning” — so it can retrain the models in real time “and fix mistakes as they go” — until Dasha hits the claimed “human-level” conversational capability for each business process niche. (To be clear, the AI cannot adapt its speech to an interlocutor in real time — as human speakers naturally shift their accents closer to bridge any dialect gap — but Chernyshov suggests it’s on the roadmap.)

“For instance, we can start with 70% correct conversations and then gradually improve the model up to say 95% of correct conversations,” he says of the learning element, though he admits there are a lot of variables that can impact error rates — not least the call environment itself. Even cutting edge AI is going to struggle with a bad line.

The platform also has an open API so customers can plug the conversation AI into their existing systems — be it telephony, Salesforce software or a developer environment, such as Microsoft Visual Studio.

Currently they’re focused on English, though Chernyshov says the architecture is “basically language agnostic” — but does requires “a big amount of data.”

The next step will be to open up the dev platform to enterprise customers, beyond the initial 20 beta testers, which include companies in the banking, healthcare and insurance sectors — with a release slated for later this year or Q1 2020.

Test use cases so far include banks using the conversation engine for brand loyalty management to run customer satisfaction surveys that can turnaround negative feedback by fast-tracking a response to a bad rating — by providing (human) customer support agents with an automated categorization of the complaint so they can follow up more quickly. “This usually leads to a wow effect,” says Chernyshov.

Ultimately, he believes there will be two or three major AI platforms globally providing businesses with an automated, customizable conversational layer — sweeping away the patchwork of chatbots currently filling in the gap. And, of course, Dasha intends their “Digital Assistant Super Human Alike” to be one of those few.

“There is clearly no platform [yet],” he says. “Five years from now this will sound very weird that all companies now are trying to build something. Because in five years it will be obvious — why do you need all this stuff? Just take Dasha and build what you want.”

“This reminds me of the situation in the 1980s when it was obvious that the personal computers are here to stay because they give you an unfair competitive advantage,” he continues. “All large enterprise customers all over the world… were building their own operating systems, they were writing software from scratch, constantly reinventing the wheel just in order to be able to create this spreadsheet for their accountants.

“And then Microsoft with MS-DOS came in… and everything else is history.”

That’s not all they’re building, either. Dasha’s seed financing will be put toward launching a consumer-facing product atop its B2B platform to automate the screening of recorded message robocalls. So, basically, they’re building a robot assistant that can talk to — and put off — other machines on humans’ behalf.

Which does kind of suggest the AI-fueled future will entail an awful lot of robots talking to each other… ???

Chernyshov says this B2C call-screening app will most likely be free. But then if your core tech looks set to massively accelerate a non-human caller phenomenon that many consumers already see as a terrible plague on their time and mind then providing free relief — in the form of a counter AI — seems the very least you should do.

Not that Dasha can be accused of causing the robocaller plague, of course. Recorded messages hooked up to call systems have been spamming people with unsolicited calls for far longer than the startup has existed.

Dasha’s PR notes Americans were hit with 26.3 billion robocalls in 2018 alone — up “a whopping” 46% on 2017.

Its conversation engine, meanwhile, has only made some 3 million calls to date, clocking its first call with a human in January 2017. But the goal from here on in is to scale fast. “We plan to aggressively grow the company and the technology so we can continue to provide the best voice conversational AI to a market which we estimate to exceed $30 billion worldwide,” runs a line from its PR.

After the developer platform launch, Chernyshov says the next step will be to open access to business process owners by letting them automate existing call workflows without needing to be able to code (they’ll just need an analytic grasp of the process, he says).

Later — pegged for 2022 on the current roadmap — will be the launch of “the platform with zero learning curve,” as he puts it. “You will teach Dasha new models just like typing in a natural language and teaching it like you can teach any new team member on your team,” he explains. “Adding a new case will actually look like a word editor — when you’re just describing how you want this AI to work.”

His prediction is that a majority — circa 60% — of all major cases that business face — “like dispatching, like probably upsales, cross sales, some kind of support etc., all those cases” — will be able to be automated “just like typing in a natural language.”

So if Dasha’s AI-fueled vision of voice-based business process automation comes to fruition, then humans getting orders of magnitude more calls from machines looks inevitable — as machine learning supercharges artificial speech by making it sound slicker, act smarter and seem, well, almost human.

But perhaps a savvier generation of voice AIs will also help manage the “robocaller” plague by offering advanced call screening? And as non-human voice tech marches on from dumb recorded messages to chatbot-style AIs running on scripted rails to — as Dasha pitches it — fully responsive, emoting, even emotion-sensitive conversation engines that can slip right under the human radar maybe the robocaller problem will eat itself? I mean, if you didn’t even realize you were talking to a robot how are you going to get annoyed about it?

Dasha claims 96.3% of the people who talk to its AI “think it’s human,” though it’s not clear on what sample size the claim is based. (To my ear there are definite “tells” in the current demos on its website. But in a cold-call scenario it’s not hard to imagine the AI passing, if someone’s not paying much attention.)

The alternative scenario, in a future infested with unsolicited machine calls, is that all smartphone OSes add kill switches, such as the one in iOS 13 — which lets people silence calls from unknown numbers.

And/or more humans simply never pick up phone calls unless they know who’s on the end of the line.

So it’s really doubly savvy of Dasha to create an AI capable of managing robot calls — meaning it’s building its own fallback — a piece of software willing to chat to its AI in the future, even if actual humans refuse.

Dasha’s robocall screener app, which is slated for release in early 2020, will also be spammer-agnostic — in that it’ll be able to handle and divert human salespeople too, as well as robots. After all, a spammer is a spammer.

“Probably it is the time for somebody to step in and ‘don’t be evil,’ ” says Chernyshov, echoing Google’s old motto, albeit perhaps not entirely reassuringly given the phrase’s lapsed history — as we talk about the team’s approach to ecosystem development and how machine-to-machine chat might overtake human voice calls.

“At some point in the future we will be talking to various robots much more than we probably talk to each other — because you will have some kind of human-like robots at your house,” he predicts. “Your doctor, gardener, warehouse worker, they all will be robots at some point.”

The logic at work here is that if resistance to an AI-powered Cambrian Explosion of machine speech is futile, it’s better to be at the cutting edge, building the most human-like robots — and making the robots at least sound like they care.

Dasha’s conversational quirks certainly can’t be called a gimmick. Even if the team’s close attention to mimicking the vocal flourishes of human speech — the disfluencies, the ums and ahs, the pitch and tonal changes for emphasis and emotion — might seem so at first airing.

In one of the demos on its website you can hear a clip of a very chipper-sounding male voice, who identifies himself as “John from Acme Dental,” taking an appointment call from a female (human), and smoothly dealing with multiple interruptions and time/date changes as she changes her mind. Before, finally, dealing with a flat cancellation.

A human receptionist might well have got mad that the caller essentially just wasted their time. Not John, though. Oh no. He ends the call as cheerily as he began, signing off with an emphatic: “Thank you! And have a really nice day. Bye!”

If the ultimate goal is Turing Test levels of realism in artificial speech — i.e. a conversation engine so human-like it can pass as human to a human ear — you do have to be able to reproduce, with precision timing, the verbal baggage that’s wrapped around everything humans say to each other.

This tonal layer does essential emotional labor in the business of communication, shading and highlighting words in a way that can adapt or even entirely transform their meaning. It’s an integral part of how we communicate. And thus a common stumbling block for robots.

So if the mission is to power a revolution in artificial speech that humans won’t hate and reject, then engineering full spectrum nuance is just as important a piece of work as having an amazing speech recognition engine. A chatbot that can’t do all that is really the gimmick.

Chernyshov claims Dasha’s conversation engine is “at least several times better and more complex than [Google] Dialogflow, [Amazon] Lex, [Microsoft] Luis or [IBM] Watson,” dropping a laundry list of rival speech engines into the conversation.

He argues none are on a par with what Dasha is being designed to do.

The difference is the “voice-first modeling engine.” “All those [rival engines] were built from scratch with a focus on chatbots — on text,” he says, couching modeling voice conversation “on a human level” as much more complex than the more limited chatbot-approach — and hence what makes Dasha special and superior.

“Imagination is the limit. What we are trying to build is an ultimate voice conversation AI platform so you can model any kind of voice interaction between two or more human beings.”

Google did demo its own stuttering voice AI — Duplex — last year, when it also took flak for a public demo in which it appeared not to have told restaurant staff up front they were going to be talking to a robot.

Chernyshov isn’t worried about Duplex, though, saying it’s a product, not a platform.

“Google recently tried to headhunt one of our developers,” he adds, pausing for effect. “But they failed.”

He says Dasha’s engineering staff make up more than half (28) its total headcount (48), and include two doctorates of science; three PhDs; five PhD students; and 10 masters of science in computer science.

It has an R&D office in Russia, which Chernyshov says helps makes the funding go further.

“More than 16 people, including myself, are ACM ICPC finalists or semi finalists,” he adds — likening the competition to “an Olympic game but for programmers.” A recent hire — chief research scientist, Dr. Alexander Dyakonov — is both a doctor of science professor and former Kaggle No.1 GrandMaster in machine learning. So with in-house AI talent like that you can see why Google, uh, came calling…

Dasha

But why not have Dasha ID itself as a robot by default? On that Chernyshov says the platform is flexible — which means disclosure can be added. But in markets where it isn’t a legal requirement, the door is being left open for “John” to slip cheerily by. “Bladerunner” here we come.

The team’s driving conviction is that emphasis on modeling human-like speech will, down the line, allow their AI to deliver universally fluid and natural machine-human speech interactions, which in turn open up all sorts of expansive and powerful possibilities for embeddable next-gen voice interfaces. Ones that are much more interesting than the current crop of gadget talkies.

This is where you could raid sci-fi/pop culture for inspiration. Such as Kitt, the dryly witty talking car from the 1980s TV series “Knight Rider.” Or, to throw in a British TV reference, Holly the self-depreciating yet sardonic human-faced computer in “Red Dwarf.” (Or, indeed, Kryten, the guilt-ridden android butler.) Chernyshov’s suggestion is to imagine Dasha embedded in a Boston Dynamics robot. But surely no one wants to hear those crawling nightmares scream…

Dasha’s five-year+ roadmap includes the eyebrow-raising ambition to evolve the technology to achieve “a general conversational AI.” “This is a science fiction at this point. It’s a general conversational AI, and only at this point you will be able to pass the whole Turing Test,” he says of that aim.

“Because we have a human-level speech recognition, we have human-level speech synthesis, we have generative non-rule based behavior, and this is all the parts of this general conversational AI. And I think that we can we can — and scientific society — we can achieve this together in like 2024 or something like that.

“Then the next step, in 2025, this is like autonomous AI — embeddable in any device or a robot. And hopefully by 2025 these devices will be available on the market.”

Of course the team is still dreaming distance away from that AI wonderland/dystopia (depending on your perspective) — even if it’s date-stamped on the roadmap.

But if a conversational engine ends up in command of the full range of human speech — quirks, quibbles and all — then designing a voice AI may come to be thought of as akin to designing a TV character or cartoon personality. So very far from what we currently associate with the word “robotic.” (And wouldn’t it be funny if the term “robotic” came to mean “hyper entertaining” or even “especially empathetic” thanks to advances in AI.)

Let’s not get carried away though.

In the meantime, there are “uncanny valley” pitfalls of speech disconnect to navigate if the tone being (artificially) struck hits a false note. (And, on that front, if you didn’t know “John from Acme Dental” was a robot you’d be forgiven for misreading his chipper sign off to a total time waster as pure sarcasm. But an AI can’t appreciate irony. Not yet anyway.)

Nor can robots appreciate the difference between ethical and unethical verbal communication they’re being instructed to carry out. Sales calls can easily cross the line into spam. And what about even more dystopic uses for a conversation engine that’s so slick it can convince the vast majority of people it’s human — like fraud, identity theft, even election interference… the potential misuses could be terrible and scale endlessly.

Although if you straight out ask Dasha whether it’s a robot Chernyshov says it has been programmed to confess to being artificial. So it won’t tell you a barefaced lie.

Dasha

How will the team prevent problematic uses of such a powerful technology?

“We have an ethics framework and when we will be releasing the platform we will implement a real-time monitoring system that will monitor potential abuse or scams, and also it will ensure people are not being called too often,” he says. “This is very important. That we understand that this kind of technology can be potentially probably dangerous.”

“At the first stage we are not going to release it to all the public. We are going to release it in a closed alpha or beta. And we will be curating the companies that are going in to explore all the possible problems and prevent them from being massive problems,” he adds. “Our machine learning team are developing those algorithms for detecting abuse, spam and other use cases that we would like to prevent.”

There’s also the issue of verbal “deepfakes” to consider. Especially as Chernyshov suggests the platform will, in time, support cloning a voiceprint for use in the conversation — opening the door to making fake calls in someone else’s voice. Which sounds like a dream come true for scammers of all stripes. Or a way to really supercharge your top performing salesperson.

Safe to say, the counter technologies — and thoughtful regulation — are going to be very important.

There’s little doubt that AI will be regulated. In Europe policymakers have tasked themselves with coming up with a framework for ethical AI. And in the coming years policymakers in many countries will be trying to figure out how to put guardrails on a technology class that, in the consumer sphere, has already demonstrated its wrecking-ball potential — with the automated acceleration of spam, misinformation and political disinformation on social media platforms.

“We have to understand that at some point this kind of technologies will be definitely regulated by the state all over the world. And we as a platform we must comply with all of these requirements,” agrees Chernyshov, suggesting machine learning will also be able to identify whether a speaker is human or not — and that an official caller status could be baked into a telephony protocol so people aren’t left in the dark on the “bot or not” question. 

“It should be human-friendly. Don’t be evil, right?”

Asked whether he considers what will happen to the people working in call centers whose jobs will be disrupted by AI, Chernyshov is quick with the stock answer — that new technologies create jobs too, saying that’s been true right throughout human history. Though he concedes there may be a lag — while the old world catches up to the new.

Time and tide wait for no human, even when the change sounds increasingly like we do.

Jul
23
2019
--

Google updates its speech tech for contact centers

Last July, Google announced its Contact Center AI product for helping businesses get more value out of their contact centers. Contact Center AI uses a mix of Google’s machine learning-powered tools to help build virtual agents and help human agents as they do their job. Today, the company is launching several updates to this product that will, among other things, bring improved speech recognition features to the product.

As Google notes, its automated speech recognition service gets to very high accuracy rates, even on the kind of noisy phone lines that many customers use to complain about their latest unplanned online purchase. To improve these numbers, Google is now launching a feature called “Auto Speech Adaptation in Dialogflow,” (with Dialogflow being Google’s tool for building conversational experiences). With this, the speech recognition tools are able to take into account the context of the conversation and hence improve their accuracy by about 40%, according to Google.

Speech Recognition Accuracy

In addition, Google is launching a new phone model for understanding short utterances, which is now about 15% more accurate for U.S. English, as well as a number of other updates that improve transcription accuracy, make the training process easier and allow for endless audio streaming to the Cloud Speech-to-Text API, which previously had a five-minute limit.

If you want to, you also can now natively download MP3s of the audio (and then burn them to CDs, I guess).

dialogflow virtual agent.max 1100x1100

Jul
12
2017
--

Gong, an AI-based language tool to help sales and customer service reps, nabs $20M

 As artificial intelligence continues its spread into all aspects of computing, many believe that it will be the next big frontier in CRM. Today a startup called Gong.io underscores that trend: the Israeli startup, which has built a tool that uses natural language processing and machine learning to help train and suggest information to sales people and other customer service reps, has raised a… Read More

Mar
17
2014
--

With A Voice Interface API For Any App, Wit.ai Wants To Be The Twilio For Natural Language

Last year, voice technology giant Nuance quietly acquired VirtuOz, a developer of virtual assistants for online sales, marketing and support — a “Siri for the enterprise” that counted with the likes of PayPal and AT&T as customers. Now, Alexandre Lebrun, the founder and CEO of VirtuOz, has taken a dive back into the startup world to launch Wit.ai, a platform and API that will let a developer… Read More

Jul
27
2013
--

Dictating your Novel

MicI’ve been experimenting with Dragon Dictate software. This is usually recognized to be the best commercially available speech recognition system and you can buy it for Windows or Mac for under $100. I wanted to see how difficult it would be to dictate my novel. Wouldn’t it be great to lay on the couch with a headset?

I was extremely impressed with the quality right out of the box. The software took me through a handful of short known sentences so it can tune to your voice. I could keep doing this for as long as I wanted and even go back later and train it in phrases that it repeatedly got wrong, but for the sake of laziness I decided to do the minimum amount of training. It did remarkably well with my English accent and the fact that I’m not a clear speaker.

I spoke at a normal conversational speed, just as if it were another person. It’s very natural. Typically nothing appeared on the screen until I said comma or period, or some other punctuation. The recognition engine likes to obtain the full context of a sentence or clause before translating, A fraction of a second later, the entire text appeared. I didn’t have to pause – I just kept going.

Let’s look at a couple of samples. I actually recorded myself speaking these snippets from my upcoming book, but chose not to include them since I sound horrible when recorded. Ugh!  Maybe I need elocution lessons – or a better mic

Here’s the text I read:

Moving faster than a Djinn out of a bottle, one of the creatures leapt up onto the roof of a dormer window that overhung the street. Worn tiles slid and crashed to the ground. It sprang again, pushed off the wall and landed beside the man. Talon-like fingernails flashed in the lantern light, and the wight raked the man’s forearm, shredding it.

And here’s how it emerged from Dragon Dictate:

Moving faster than a gene out of a bottle, one of the creatures leapt up onto the roof of a dorm window that overhung the street. One tile Slate and crashed to the ground. It sprang again, pushed off the wall and landed beside the man. Talent like fingernails flashed in the lantern light, and the white rate to the man’s forearm, shredding it.

Not bad! You can see exactly why it went wrong, largely because of a lack of knowledge of a creature called a wight and the pronunciation of a couple of words.

 

Here’s another sample:

“I want to be a necromancer.” Her eyes locked on mine.

“Right. Do you even know what one is?”

She rolled her eyes. “Everyone knows what you do, though I bet only half of the stories are true.”

“It’s dirty and dangerous and not at all becoming for a girl.”

 

And how it came out:

 ”I want to be a necromancer.” Her eyes locked on mine.

“Right. Do you even know what one is?”

She rolled her eyes. “Everyone knows what to do, though I bet only half of the stories are true.”

“It’s dirty and dangerous and not at all coming for a girl.”

Almost completely perfect.

 

That second piece was trickier because I had to say open quote and close quote, and this is one of the things that made it awkward to use. After hours of practice,  I remembered most of the time (and went back and added the missing ones later), but it definitely broke my concentration. I had to say new line for paragraph breaks too. I could edit by telling it to select a word/phrase and then to replace or insert, but since I had to proofread it anyway, I found it easier to make fixes using the keyboard. It was fun dictating a page or so and then going back to clean it up, but it definitely took discipline.

This leads to my final point. Apart from the overhead of the extra words (which I think could be overcome after days or weeks of practice), I just couldn’t think verbally. Neural pathways have been strengthened between the creative parts of my brain and my fingers, and that’s how I have trained my body to write. It just wasn’t natural to dictate. I had expected it to be like having a conversation, but I suspect that during the act of transcribing our creativity, our eyes are subtly scanning the paragraph and lines we have written to retain context – sort of keep our mental place. Dictating took more conscious effort (perhaps because it’s unnatural) and I regularly lost my place or forgot what I had just said. I suspect this would be even worse if I had attempted to dictate into my iPhone away from my computer, without the visual cue of the screen.

So much for my dream of dictating my novel on my drive to work.

I have however found the perfect use for it. For me, it works great when I want to describe setting and mood. I put on my headset, close my eyes and just say what I picture in my mind. It works great for description like that. Dialog, not so much.

 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com