Skip to Content Java Solaris Communities Partners My Sun Sun Store United States Worldwide

»  1992
»  1993
»  1994
»  1995
»  1996
»  1997
»  1998
»  1999
»  2000
»  2001
»  2002
»  2003
»  2004
»  2005
»  2006

Fiscal 1994 Project Portfolio Report





Speech Applications

Nicole Yankelovich and Paul Martin, Principal Investigators
nicole.yankelovich@East.Sun.COM
paul.martin@East.Sun.COM

Overall Objective

To build a robust, effective environment for speech applications and to help define platform-level requirements needed to support sophisticated speech interaction.

Objective for FY94

To explore both underlying architectural issues and user-level interaction techniques of speech applications, including the development of a set of tools to support others in the creation of speech applications.

Description

Speech technology can benefit users whose hands and/or eyes are busy, users who suffer from certain physical disabilities, or users who are away from their computer. Our initial focus has been on this last group of users. The prototype speech applications we have developed are all aimed at allowing users who are away from their desks, at home, or on the road to call up their Sun workstation and verbally interact with several popular DeskSet applications.

To construct these speech applications, we have created a prototype speech application framework called SpeechActs, in which multiple speech-driven applications can be integrated. This framework supports multiple speech recognizers and synthesizers and includes a natural language component and a unified grammar language.

Accomplishments

The most significant accomplishment this year was the construction of the prototype SpeechActs Framework. The Framework currently supports both the Hark(TM) recognizer from BBN and the Dagger(TM) recognizer from Texas Instruments. It also supports the TruVoice(TM) synthesizer from Centigram. To effectively integrate this speech technology, we added a blackboard to store shared data; an audio server to handle headphone, speaker box, and telephone access; a text-to-speech server to provide generic interfaces for both C and Lisp processes; and a pipe protocol to rationalize Lisp-to-C process communication.

We also created a Unified Grammar compiler and designed a corresponding specification language as part of the SpeechActs Framework. The language is used for specifying both speech recognizer and natural language grammars in a recognizer-independent way. The language formalism is an augmented pattern- matcher, using context-free, extended Backus-Naur form (BNF). The augmentations include access to features specified in a lexicon and provide for Pascal-like constraint specifications and result structure composition. The Unified Grammar compiler guarantees that the speech recognition and natural language grammars are synchronized. It compiles all constraints for the natural language processor and most of the constraints for the speech recognizer.

To test out the Framework and Unified Grammar compiler, we created five speech applications. A small login application, written in Lisp, allows users to telephone a Sun workstation, identify themselves, enter a password, and choose an application. To simplify the process of logging in, we take advantage of the telephone's caller-ID feature to try to pre-identify the user. Once logged in, the user may opt for the mail, calendar, weather or stock quotes' application. Mail and calendar are implemented as C wrappers to the Mail Tool and Calendar Manager APIs so that users are able to access their current mail spool and calendar data files. Once they are interacting with mail or calendar, users can mark any information that has been read aloud and ask to have that marked information faxed to them at a pre-determined location (home, work, etc.) or at a number entered using the telephone keypad. Weather and stock quotes both provide speech access to on-line data feeds. Users can check the National Weather Service forecasts around the United States or can check the current price of technology-related stocks.

Designing and experimenting with these applications has led us to identify a number of important Speech User Interface (SUI) principles. The most important of these involves basing dialog design on people's natural conversational patterns rather than attempting to translate graphical user interfaces directly into speech user interfaces. The challenge is to design a dialog that lets a user carry on a natural conversation without exceeding the restrictions of the system's lexicon or grammar.

References

Publications
"SpeechActs: A Framework for Building Speech Applications," N. Yankelovich, E. Baatz, AVIOS `94 Conference Proceedings, San Jose, CA, September 20-23, 1994, SMLI 94-0243.

"SpeechActs: A Testbed for Continuous Speech Applications," P. Martin, A. Kehler, Submitted to AAAI `94 Workshop on the Integration of Natural Language and Speech Processing, Seattle, WA, August 1-2, 1994, SMLI 94-0032.

"Talking vs. Taking: Speech Access to Remote Computers," N. Yankelovich, CHI '94 Adjunct Proceedings, 1994 ACM Conference on Human Factors in Computing Systems, Boston MA, April 24-28, 1994, SMLI 94-0013.

"SpeechActs and the Design of Speech Interfaces," N. Yankelovich, CHI'94 Workshop on The Future of Speech and Audio in the Interface, 1994 ACM Conference on Human Factors in Computing Systems, Boston MA, April 24-28, 1994, SMLI 94-0046.


Would you recommend this Sun site to a friend or colleague?
Contact About Sun News Employment Privacy Terms of Use Trademarks Copyright 1994-2008 Sun Microsystems, Inc.