- Introduction
- Spoken Language of the Deployment Region
- Grammar Definition
- Speech Recognition and Text-to-Speech Engine Selection
- VoiceXML Specification Support
- Data Provisioning
- Content Provider Interface
- Performance, Capacity, and Reliability
Speech Recognition and Text-to-Speech Engine Selection
As mentioned, the spoken language of the end users has a profound impact on the effectiveness of the ASR and TTS engines. Hence, much attention must be given to the respective engines used by the voice portal. In the best case, the ASR and TTS engines have been developed based on the same language model. However, because it's rare to have the same manufacturer for both types of engines, some differences will exist.
Try to thoroughly understand the speech-processing engine documentation used by the voice portal. This includes the languages supported, the testing/verification strategy used by the speech vendor and the portal provider, and the type of ambient noise model used for the ASR engine. The last point is important because some ASR vendors use models based on recordings from a studio microphone. Ideally, a telephony model based on wireline calls is desired for wireline calls, whereas a wireless (cellular) model would be needed for wireless calls. The models for each are created by factoring in the characteristics that each type of respective network would introduce.