|
SpeechActs: User Interface Research
Designing speech user interfaces (SUIs) is still an art-form. One goal of the Speech Applications Group is to develop design guidelines and practical hints for developers based on our experience developing and user testing a range of speech applications. The material on this page is derived from the CHI '95 paper, "Designing SpeechActs: Issues in Speech User Interfaces."
Simulate ConversationHerb Clark says that "speaking and listening are two parts of a collective activity." A major design challenge in creating speech applications, therefore, is to simulate the role of the speaker/listener convincingly enough to produce successful communication with the human collaborator. Here are some of the techniques we applied in our speech user interface design effort.
Study Human DialogsIn most of our application development projects, we begin the design process by studying human-human dialogs in the domain of the application. In one pilot field study, for example, we analyzed telephone conversations between two sales managers and their assistant. The assistant interacted with Sun's Calendar Manager while talking to the managers. We discovered, that although she had the graphical interface in front of her, neither she or the managers used the vocabulary from the interface. For example, relative date expressions and anaphoric references were common conversational elements that do not appear in the graphical interface. To learn more about these human dialog studies, see "Using Natural Dialogs as the Basis for Speech Interface Design"
Use Conversational DevicesAdhering to conversational conventions helps improve the speech interface. Just as in human-human dialog, grounding the conversation, avoiding explicit prompts, and using discourse cues enhances communication.For example, the use of the discourse segment pop cue, "What now?", helps to reorient users after a subdialog. Listen again to the fax subdialog within the mail application.
Our user studies demonstrated that adding this small prompt did, in fact, reorient users and help them to figure out what to say next.
Tailor FeedbackSpeaker-independent, continuous speech recognition over the telephone is still quite error prone; therefore, feedback is essential. Verification should be commensurate with the cost. We implicitly verify commands which involve presentation of data, but explicitly verify commands that might destroy data or trigger future events. For example, when reading calendar events for a particular day, the verification of the date is woven into the response. Contrast that with what happens when the user says "so long."
Design for ErrorsRecognition errors are inevitable. In our user study, we found that people become frustrated very quickly with these errors, particularly if the error feedback is repetitive. One user said, "It was repetitive when it didn't understand what I said--then it turned into a machine."
We redesigned the error messages using a technique we call progressive assistance. After the redesign, a user said "It gave me the perception that it's trying to understand what I'm saying."
It is also helpful to provide a safety net to take common types of errors into account. For example, in applications that allow users to create recorded messages, users often start speaking the content of their message too soon. In the Office Monitor application, the system asks, "Do you want to leave a message?" Some users will compliantly say "Yes" or "No," as in this example:
More often, however, users just start speaking the message instead of answering "Yes" or "No." The Office Monitor application was designed to handle both cases. At the prompt, the system turns on both the speech recognizer and the recording mechanism. If the recognizer returns an error, the system assumes that the user spoke a message.
Taper Presentation of DataSince speech is such a slow output medium, it is important to be as brief as possible. Tapering the presentation of repetitive data can cut down on the length of the speech output. Here is how extraneous words are eliminated when SpeechActs reads calendar appointments or the status of a stock portfolio.
On Wed Sept 28th, From 10:00 to 11:00, you have, Staff Meeting.
From 11:00 to 12:00, Meeting with Bob.
From 4:00 to 5:00, Beer Bust...
Your portfolio status as of an hour ago:
Sun was trading at 28 and 3/8, down 1/4 since yesterday.
IBM was at 69 and 5/8, down 5/8.
Apple was at 33 and 7/8, down 1/4.
In the recorded demo, listen to how natural it sounds when implied words are dropped after a pattern for presentation has been established.
When tapered presentations are still too long, users are able to interrupt the synthesizer with their voice or with a telephone key.
Take Personality into AccountIn designing SpeechActs, we did not set out to create a computer character with personality. Our experience, however, suggests that, like it or not, people attribute personality traits to a speech-only system. In a user study conducted in July 1994, we asked participants to complete 22 tasks and to answer a set of questions. When users were asked to describe the personality of SpeechActs, comments included:"Friendly," "Benign," "Quirky," "Empty."
Personality Example (9 seconds audio), formats:
au,
wav,
real-audio,
text
"It didn't seem like I was talking to a computer after a while." | ||||||||||||||||||||