Entering Data by Speaking on the Telephone: Three Problems and What to Do about Them
- Problem 1: Callers Don't Know When to Speak and What to Say
- Problem 2: Speech Recognition Errors Break the Dialog's Rhythm and Slow Down the Dialog Between the Caller and the Computer
- Problem 3: When Detected, Callers Cannot Easily Correct Misunderstandings
- Other Problems
Telephones and cell phones are everywhere. Users call from wherever they are and call whenever they want to. But when calling businesses, callers frequently are put on hold. To overcome the long hold queues, several businesses have installed IVR systems that collect data by asking the callers carefully formatted questions. Some systems require callers to answer questions by pressing the buttons on touchtone phones. These systems have several disadvantages:
Callers constantly must reposition the phone between their ears and the front of their faces so they can see the buttons that they must press. Speech recognition removes the need to constantly move the telephone.
Callers must translate options to numbers. For example, "For accounting, press 1; for human resources, press 2; for sales, press 3..." This requires callers to select the appropriate option, and translate that option to a number before pressing the appropriate button on the telephone keypad. Speech recognition simplifies this process. Callers simply speak the answer rather than trying to remember the options, selecting the best option, and, finally translating the selection to a number.
Because of the limitations of human short-term memory, developers structure menus to be long and narrow rather than short and fat. Callers often "get lost" in these long menu hierarchies and cannot find their way to the desired option.
Using speech-recognition technology solves these problems, but creates some new ones. There are always problems with new technology. Speech recognition is no exception. This article discusses three problems with using telephones to enter data by speakingand what you can to do about them.
Problem 1: Callers Don't Know When to Speak and What to Say
Currently, many callers do not have experience using a telephony application with automatic speech recognition. They don't know that they can speak. Often, they are "tongue-tied" about what to say. Here are some hints to help callers say the right thing at the right time:
Inform the caller that they may speak. At the beginning of the application, inform the caller that the application can understand human speech and that he should respond to questions by speaking the answers. For example:
"Welcome to the Ajax banking application. You may answer questions by speaking directly into your phone."
Encourage the caller to respond to a prompt by speaking. Phrase the prompt to encourage the caller to speak. Use words in the prompt such as "say" or "speak" instead of "enter." For example:
"Say your name."
rather than
"Enter your name."
Tell the caller what to say. Prompts should lead the callers to say words and phrases in the corresponding grammar. If the caller is not familiar with the appropriate words and phrases, include them as part of the prompt. For example:
"Which account? Savings or checking?"
If the caller is already familiar with the individual words, shorten the prompt. For example:
"Which day of the week?"
instead of:
"Which day? Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, or Saturday?"
Encourage experienced callers to barge in. Novice callers usually listen to the entire prompt. They need to hear all of the instructions and options before making their selection. However, experienced callers may resent listening to complete prompts, especially if they use the application frequently. In conversations between people, barging-in may be rude. However, computers are never insulted when callers barge in. Inform callers that they may bypass lengthy prompts by "barging-in"speaking before the prompt ends. For example:
"You may speak at any time, even if the computer is speaking."
Insert pauses in the prompt wording where expert, average, and novice callers may speak. A pause signals speakers that they should speak. Callers with different experience levels may barge in during different pauses.
"Color?"
(Pause, so experts can barge in here. They know the question and the appropriate responses.)
"Say the color you want."
(Pause, so an average caller who already knows the allowable options can barge in.)
"Green, red, or blue?"
(The novice caller responds after hearing the allowable options.)
Callers will quickly learn that barging-in will speed up a conversation, so that callers can perform their desired tasks quickly.
Continue to encourage the user to speak. Sometimes, the user says a word that is not covered by the grammar of allowable words. In these cases, a useful strategy is to reveal additional information and instruction to the caller each time the caller is prompted for the same information. For example:
Level 1. Present a short prompt, asking the caller to respond.
Level 2. Present a short description of what the caller should say.
Level 3. Present an example of what the caller should do.
Level 4. Offer to present short segments of a verbal tutorial to the caller, or transfer the caller to a human operator to resolve the caller's problem.