- Voice services and applications
- What is VoiceXML?
- Road map to the book and other resources
- Getting Started
1.3 Road map to the book and other resources
While this book is organized into a logical progression from basic concepts to more difficult ones, we understand that many readers will require information in a different order they may need for their particular task. Therefore we provide this chapter to guide readers quickly to the information they seek.
1.3.1 How to use this book
This book is primarily intended for the developers who want to write voice applications using VoiceXML. We have attempted to provide detailed language information and a broad collection of code examples. Of particular interest to the programmer will be Chapter 2, "VoiceXML essentials," Chapter 3, "VoiceXML language reference," and Chapter 4, "Enterprise voice application architecture." This book is also intended for higher-level managers and decision makers who need to understand the risks and challenges associated with developing and deploying voice applications, as well as the role VoiceXML plays in the voice application space. Of particular interest to the manager would be this chapter and Chapter 5, "Voice services," and to a lesser degree Chapter 4, "Enterprise voice application architecture." The following gives a brief summary of the chapters of this book.
Chapter 1, "VoiceXML and voice services," introduces the terminology and basic concepts. It also gets you started with setting up a VoiceXML development environment.
Chapter 2, "VoiceXML essentials," on page 24 discusses the important constructs of VoiceXML. It will cover all of the essential skills required to build dialogs in VoiceXML, including:
- how to collect user input,
- how to generate responses,
- how to write grammars,
- how to control dialog flow.
Chapter 3, "VoiceXML language reference," on page 136 provides an element-by-element reference to the entire VoiceXML language including GRXML and SSML. This reference chapter is based on the VoiceXML 2.0 Specification but also contains legacy information pertaining to the VoiceXML 1.0 Specification.
Chapter 4, "Enterprise voice application architecture," on page 308 explores the integration of voice and data systems using VoiceXML with comprehensive examples. These examples are all complete and intended to provide some insight into the design process, architecture, and implementation of enterprise voice applications.
Chapter 5, "Voice services," on page 362 takes a close look at building voice services, as well as the components and protocols for deploying both local and enterprise systems. It revisits voice application design from a human-factors perspective, and discusses the trade-offs of application development versus outsourcing. Finally it concludes with a look at the voice application eco-system, including other specifications and future directions for the VoiceXML and voice application fields.
1.3.2 Terminology
This section provides a quick primer on the jargon and acronyms that pervade the voice application industry.
Telephony services terms Dual-Tone Multi-Frequency (DTMF)
Refers to the touch tones (09, *, and #) on a standard telephone.
Public Switched Telephone Network (PSTN)
This typically means "the telephone company."
Interactive Voice Response (IVR)
This term can be used as an adjective, as in "my IVR application," or as a noun referring to the actual IVR hardware, as in "the call is passed to the IVR." A VoiceXML interpreter can be thought of as an IVR.
Communication protocols Voice Over Internet Protocol (VoIP)
(Pronounced "Voice over IP.") This is a technology for sending voice data, such as phone calls, over an IP network, such as the Internet.
H.323
A broad class of specifications for packet-based communications protocols. H.323 includes a specification for VoIP as well as SIP used to carry both voice and video-conferencing data.
Session Initiation Protocol (SIP)
This is a protocol for setting up calls. It is part of the H.323 specification and is starting to be implemented in bleeding edge telephony products and infrastructure.
Media Gateway Control Protocol (MGCP)
Provides a standard for converting analog audio data into IP-based packet data.
Call Processing Language (CPL)
An XML-based protocol to describe and control Internet telephony services. It implements a subset of SIP.
Web protocols Internet Protocol (IP)
A protocol used by computer applications to intercommunicate over a network.
Transmission Control Protocol (TCP)
This protocol is responsible for verifying the correct delivery of data to its destination. TCP adds support to detect errors or lost data and to trigger retransmission until data is correctly and completely received at the destination.
HyperText Transport Protocol (HTTP)
This is a protocol that sits atop TCP/IP (the combined TCP and IP protocols). Originally designed for browser-to-Web-server communication, it is based on a request-response paradigm and is used by a VoiceXML interpreter to communicate with a document server.
Common Gateway Interface (CGI)
This is a protocol for integrating dynamic Web page services with an HTTP Web server.
Java Server Pages (JSP)
A language for embedding server-side Java code into pages served by a JSP-enabled HTTP Web server.
Active Server Pages (ASP)
Microsoft's answer to JSP.
Hypertext Preprocessor (PHP)
PHP is a widely-used general-purpose scripting language that is especially suited for Web development and can be embedded into HTML.
Document Type Definition (DTD)
XML markup declarations that define the structure and other properties of a class of XML documents.
Extensible Stylesheet Language Transformations (XSLT)
An XML document that specifies how to transform another XML document into yet a third.
Namespace
A way to specify the scope of names in an XML document.
Voice services Automatic Speech Recognition (ASR)
A system that listens to an audio stream containing human speech and produces a symbol representation of that speech. This can be implemented in either hardware, software, or some combination.
Text-To-Speech (TTS)
A system that takes a symbolic representation of speech (i.e. text) and renders it as audio.
Speaker Verification / Authentication
A technology that discerns how different people say the same words. This "voice printing" can be used to ensure a caller is who he says he is.
Form Interpretation Algorithm (FIA)
The FIA is an integral part of VoiceXML. It is the logic that drives the interaction between a user and a VoiceXML form (or menu). It controls how variables are initialized, when to enter and leave a form, which items to visit in a form, and other form related logic. For a complete description of the language see Appendix C, "Form Interpretation Algorithm," on page 430.
1.3.3 More resources
There is a growing community of VoiceXML sites on the Web. The best jump-off point is http://www.voicexml.org. This is where you can find the most recent Specification, as well as numerous links to other companies and websites.
While we have made an effort to keep this book as vender-neutral as possible, if you want to start working with VoiceXML, you'll need to use one of the freely available products. Here are a few pointers.
http://developer.voicegenie.com
VoiceGenie makes a VoiceXML platform. They have two developer boxes that you can call into to test your applications. This service is free, but the call is a toll-call to Toronto. You will need to create a login.
Voxeo is a voice-hosting service. They provide a toll-free developer system that you can call into to test your applications. This service is free. You will need to create a login.
http://www-3.ibm.com/pvc/products/voice/voice_technologies.shtml
IBM's WebSphere product is VoiceXML enabled. They provide a free download of their WebSphere Voice Server SDK 2.0 and the WebSphere Voice Toolkit 2.0. These two products comprise a speech-server and a VoiceXML development environment (a full-fledged VoiceXML IDE) that you can run entirely on your desktop machine (CPU permitting).
This is Nuance's developer site. Here you can download developer licensed versions of Nuance 8, their main ASR product, Vocalizer, their TTS product, and V-Builder, their VoiceXML IDE. Once installed, these components allow you to write and test VoiceXML on your desktop PC. These are free downloads. You may need to create a login. You may need to download the NT patch from this Nuance website for Vocalizer.
http://www.speech.cs.cmu.edu/openvxi/OpenVXI_2.0.1/Readme.html
This is the main page for the OpenVXI project, originally started in the Carnegie Mellon University speech research group, then taken over by SpeechWorks, Inc. (http://www.speechworks.com). OpenVXI is an open-source VoiceXML interpreter.
HeyAnita is a voice-hosting service. They provide a toll-free developer system that you can call into to test your applications. This service is free. You will need to create a login.
Telera is a VoiceXML platform company. They provide a toll-free developer system that you can call into to test your applications. This service is free. You will need to create a login.