- Problem 1: Callers Don't Know When to Speak and What to Say
- Problem 2: Speech Recognition Errors Break the Dialog's Rhythm and Slow Down the Dialog Between the Caller and the Computer
- Problem 3: When Detected, Callers Cannot Easily Correct Misunderstandings
- Other Problems
Problem 2: Speech Recognition Errors Break the Dialog's Rhythm and Slow Down the Dialog Between the Caller and the Computer
Occasionally, automatic speech-recognition systems make errors, just as humans make speech-recognition errors in daily conversations. However, the number of errors made by a speech recognition engine can be minimized:
Avoid words in the grammar that are confusing for the speech recognition engine. If two words are frequently confused by the speech-recognition engine, modify the grammar to use different words or phrases that are distinguished easily. For example, change the vocabulary from {"green", "gray"} to {"forest green", "charcoal gray"}.
The grammar should contain words frequently spoken by callers. Wizard of Oz tests and usability tests are necessary to determine which words callers frequently speak. (During a Wizard of Oz test, the developer pretends to be the computer, asks the caller questions, and records the caller's responses.)
Keep the grammar as small as possible. If the grammar is too large, the speech-recognition system may become confused about which of several words matches the caller's utterance. This confusion may result in a mismatch event, with the computer asking the user clarifying questions. Smaller grammars enable the speech-recognition engine to return accurate results more quickly.
Validate low-confidence recognitions. When the speech recognizer returns a result with a low confidence rating, confirm the result by asking the caller a yes/no question. For example, if the speech-recognition engine returns a low confidence score for the word "Austin," ask the caller to confirm: "Did you say Austin?"
The caller mumbles or speaks a word not in the vocabulary. Prompt the caller again, but use a different wording to encourage the caller to say one of the words in the vocabulary. For example:
Prompt: Which account? Savings or checking?
Caller: Hmmm.
Prompt: Do you want to access your savings or checking account?
If the confidence scores are similar for two words in the vocabulary, then ask a yes/no question about the most frequently used word. For example, suppose the most frequent answer to the question: "Which color? Blue or green?" is green, and the confidence scores are for blue and green are about the same. Validate by prompting the caller with the yes/no question: "Did you say green?"
Frequently, people ask these types of questions in daily conversations, and think nothing of answering these questions with a simple "yes" or "no." Most callers are never aware that a real or suspected speech-recognition error occurred. Also, callers often say the correct value after responding to a yes/no question.