When MSJ asked me to check out the latest in speech technology from Microsoft, I popped the Microsoft Voice floppy (actually, there are two) into my multimedia machine-a 486/66 with 28MB of RAM, a SoundBlaster 16 with cheapo speakers and a $10 microphone from Radio Shack-and typed SETUP.
After the usual installation wizard stuff, I got yet another icon added to my task bar (see Figure A). Clicking it gave me the menu in Figure B. I selected Properties and got a tabbed dialog that let me control various options, the most interesting of which is what voice I wanted my computer to have (see Figure C). There are several characters to choose from, with names like Deep Douglas, Eager Eddie and Grandpa Amos, all of whom sound like they're a few days shy of full recovery from a laryngectomy. Peter is the default and least grating among them-but just for fun, I selected Wanda, who sounds like a witch with her broom in the wrong place.
When you first install Voice, you get a brief tutorial that asks you to say, "What can I say?" When you do, a list of voice commands pops up. You are then instructed to say "Close window." No matter how many times I did, that darned window just wouldn't go away! I kept getting the same sequence of ToolTip messages: "Heard. Not recognized. Please speak louder." When I yelled into the mike, I got the same sequence, sans "Please speak louder." I pressed Alt-F4 to close the window.
I fiddled around a bit-adjusted the input volume and gain, turned off my radio, held the mike close to my lips, and even "trained" Wanda to recognize my voice by repeating, at her request, the digits zero through nine plus nineteen short phrases including "Who am I?" which felt very existential. Eventually, I got it to work.
In fact, it worked pretty darn well! I was impressed. I said, "Start running Microsoft Word" in a normal voice and, sure thing, Voice launched Word! (When you install Voice, it scans your entire disk for programs and adds a "Start running X" command for every app it finds.) I said, "File New" and it created a new document. I said, "Switch to NDOS" and it switched to WinCIM. Well, that's OK, I can forgive Wanda for not knowing how to pronounce NDOS. I said "Next window" several times to cycle the windows until I got to my NDOS window. Just like pressing Alt-Tab. Wanda was able to consistently recognize other generic commands like "Close window," "Minimize window," "Press cancel," "Press enter," and "Show help."
Any time you run a program, Wanda automatically adds its menu to her repertoire, in effect turning any out-of-the-box Windows-based app into a speech app. I tried it on my TRACEWIN program from October's C/C++ column, and I was amazed that Wanda was able to recognize "Trace output off," "Trace output to window" and other TRACEWIN commands with no trouble. She's a pretty good listener, actually. Even if she can't talk too well. She had no problem recognizing my wife's voice, either-though I thought I detected a slight hint of jealousy in her responses, laryngectomy aside.
If you ever find yourself speechless, all you have to do is ask, "What can I say?" to get the window in Figure D, which lists everything you can say. I got global commands like "Show help" and "What can I say?" as well as TRACEWIN commands like "Trace output off."
To check out text-to-speech, I opened my draft of this text, selected the first paragraph, and said, "Read selection." Wanda read it flawlessly in her raspy monotone, which by now seemed almost tolerable. She pronounced MSJ correctly as initials, 486/66 as "four-eighty-six slash sixty-six", lowered her voice when speaking parenthetically, and even converted $10 to "ten dollars." I did not fail to notice, however, that she pronounced the word "Microsoft" with suspicious clarity, leading me to suspect a few extra "if" statements in the code; whereas "SoundBlaster" came out like "SoudBlaster"-but then it turned out I had in fact misspelled it exactly that way! Now I started to feel downright uneasy-Wanda was already finding my flaws.
When I turned on keyboard commands, which let you enter text by spelling, things started turning surreal. I said, "Pee ay you el,"expecting to see my name, but it came out "88d." I figured that Microsoft needed to go back to the drawing board on that one. But no, it was my fault again; you have to use international alphabet mnemonics like Alpha, Bravo, Charlie, and so on to Zebra. Fortunately, I have my pilot's license, so I know that stuff by heart. I said, "Capital-papa alpha uniform lima," pausing several seconds between each word, and, sure enough, "Paul" typed itself magically into my doc! But when I spelled "DiLascia," the Find dialog popped up because Wanda thought I said "F3." Oh well, no one ever spells my name right anyway. Wanda got it the second time, but when I reached the "s" in "DiLascia", no matter how precisely I tried to enunciate "sierra," Wanda insisted on hearing it as "zero." At first, I took it as an insult, but then I realized she was just being her typical computer self, preferring digits to letters. So I forgave her. (I think I hurt her feelings, though, because after that she would every now and then for no apparent reason ask, via her ToolTip window, "Is your microphone plugged in?" There was nothing wrong with the microphone. I like to think she was just hinting that she wanted me to say something. As Mr. Rozak says in the article, speech engines like to hear.)
If you're wondering how well Wanda performs, well, I have to say she's in no danger of winning any speed dictation trophies. At best, she can handle about one command every five or ten seconds on my 486. Also, when Wanda listens, she gobbles CPU cycles the way Arnold Schwarzenegger gobbles roast beef sandwiches. Everything turns to molasses. To avoid processor gridlock, you can set things up so you have to press a key or move the mouse to the upper-left corner of your screen to make Wanda listen.
So, what's the bottom line? Well, I definitely wouldn't use Wanda to get any real work done unless I broke both my hands-and even then I'm not sure it wouldn't be faster to type with my elbows. But there's definitely some very real and impressive technology at work here. Text-to-speech is, not surprisingly, better than speech recognition. Maybe in another couple of years. But no matter how flawless the technology becomes, you won't ever catch me talking to my computer. It seems silly. TTS seems more useful. I can see having my computer read an article back to me, and I really like the way, even today, dictionary and encyclopedia programs can pronounce words and foreign place-names. And if they could just make Wanda sound a little more like Stevie Nicks, I might not mind her occasionally asking if my microphone is plugged in.
It sure makes for great demos, though. Just be careful whom you show it to. Now whenever I ask my wife when dinner'll be ready she says: "Heard. Not recognized."
From the January 1996 issue of Microsoft Systems Journal.