Shocking news: Voice to text that works
If you’ve been to a trade show over the last five or six years, you must have seen the demos of software that promises to convert your spoken words into characters in a word processing program. With a little bit of “training” on a single voice, these programs were often capable of rendering prose that, while both surreal and erratically spelled, bore not a passing resemblance to what you said.
So Rogers Wireless’s announcement of a voice-mail-to-text-message service powered by SpinVox left me, shall we say, sceptical. If software that is actually trained to a voice is erratic, how can you expect a machine to recognize and transcribe any old stranger’s voice that leaves you a voice mail?
Well, quelle surprise … The service actually works. Well, I might add.
I threw a few challenges at it after opening with a rather banal “just calling to see if this works” message. It handled the message from an accented friend almost perfectly (though, to be fair, she speaks more clearly and intelligibly than most “unaccented” people I know), stumbling only over her e-mail address — who is this provider “Synthatical,” anyway? But it did recognize it as an e-mail address, and rendered it accordingly.
Another friend left a message full of “hmms,” “mmms” and “yums,” all of which were decoded perfectly. So I upped the ante, throwing a few nasty words at it (it apparently has a quite expansive dictionary) and an “a’ight?” which it correctly, though humourlessly, spelled “alright.” (It also turned my “I’m Audi” into “I’m out of here.”)
Then I threw some French at it, which it also managed well, considering my butchery of the language and accent. Quelle surprise, indeed. Still waiting for a phone call from a Spanish-speaking friend who rolls her Rs hysterically, but so far, it hasn’t been fazed by anything.
Considering the scope of the job, my criticism — that neither phone calls nor e-mail addresses come up as actionable links — seem picky, especially since the phone Rogers sent was refurbished and hardly high-end (and, in fact, can’t seem to keep the right time and date).
So, colour me impressed. The recognition engine has obviously got something going for it, something I’d like to see in a desktop program. It might make transcribing my interviews much less painful.
