ACM Interactions, Volume 3, Number 6, November/December 1996.
Copyright 1996 Sun Microsystems, Inc.
Sidebars
How Do Users Know What to Say?
Nicole Yankelovich
Sun Microsystems Laboratories
Some Speech Technology Terms
Following are definitions of speech-related terms used in this column as
well as some suggestions for learning more about speech technologies.
Speech Recognition
The capability of a computer to take spoken input from the user and convert
it to text.
Speech Recognizer
A software system (sometimes with hardware components) that processes a
digital audio signal from a file, microphone, or telephone. The processing
produces the best fit of possible word sequences based on a frequency table
or predefined grammar specification. The output is one or more strings of
text, often with associated measures of how closely the string matches the
specification.
Discrete Speech
Some speech recognizers only allow users to speak one word or short phrase
at a time, such as Calendar or Send-to-back.
Continuous Speech
Other speech recognizers allow users to speak connected streams of words,
although usually only one sentence at a time. For example, I would like my
calendar, or Send this to the back.
Vocabulary
The words or phrases a speech recognizer can hear. Both discrete and
continuous speech recognizers have a word list, sometimes called a lexicon.
Grammar
A specification used by some continuous speech recognizers for how words in
the vocabulary can be strung together.
Dictation
The use of speech for entering free text, as in a memo or the body of an
electronic mail message.Commercially available dictation systems currently
all use discrete speech recognition, but with very large vocabularies.
Dictation systems are distinct from command-and-control systems, which are
designed to allow the user to issue commands and to control application
behavior.
Speech Output
The computer can either play recorded audio messages or can convert text to
speech using a speech synthesizer. The recorded audio provides higher
quality output, but a synthesizer is almost always used when the content of
the output is not known ahead of time.
Prompt
A recorded or synthesized message produced by the system for the user.
Speech-only Interface
An application interface that has no other input or output mechanism other
than speech. Telephone-based applications are the most common speech-only
interfaces.
Multimodal Application
In the context of this article, an application that uses any number of
input and output modalities, including speech.
Books that Include some Speech Design Issues
Interactive Speech Technology: Human Factors Issues in the Application of
Speech Input/Output to Computers. Baber, Christopher, and Noyes, Janet M.
(eds.), Taylor & Francis Ltd., London, 1993.
Schmandt, Christopher. Voice Communications with Computers. Van Nostrand
Reinhold, New York, 1994.
Web Sites with General Information about
Speech Technology
Speech Demonstrations Over the Telephone
Note: When calling these numbers, listen to the prompts and think about
whether and how they could be improved.
CheckFree Corporation
(800) 392-0743
Electronic bill payment.
Linkon Demonstration Hotline
(800) 793-3667
Voice fax on demand and text-to-speech demos using Lernout & Hauspie recognizer.
Nortel StockTalk
(514) 765-7862
Speech system for stock quotations.
Voice Control Systems (VCS)
(214) 404-9405
Alphabet Recognition Demo
VCS Barge-In
(214) 404-0777
Demonstrates user's ability to interrupt speech output.
VCS Connected
(214) 490-0767
Connected Digit Recognition Demo
VCS Credit Card
(214) 490-1210
Credit Card Validation Demo
Voice Processing Corporation (VPC)
(617) 577-8422
Demos include Voice Dial, Auto Attendant, Credit Card, Continuous Digits
Wildfire Communications, Inc.
(800) 945-3347
Not much speech recognition in the demo, but you can listen to a simulated session
between a Wildfire user and the system.
|