The Voice Web
VoiceXML builds on existing data networking standards, such as XML, HTTP, and TCP/IP; and on telephone standards in the Public Switched Telephone Network (PSTN) and Integrated Services Digital Network (ISDN).
The voice web consists of the PSTN, VoiceXML applications on the Internet, and a VoiceXML gateway between the Internet and the PSTN. The VoiceXML gateway hosts specialized hardware and software that enable voice browsing. Some of these resources, such as ASR and TTS, may be located on separate network elements and accessed remotely.
In a voice browser session, the user's phone call goes over the PSTN to a VoiceXML gateway. Based on the number that the user dialed, the gateway downloads and possibly caches the corresponding VoiceXML application from the Internet. The gateway then steps through the VoiceXML, interacting with the user as defined in the application.
A typical voice network is shown in Figure 1.
Figure 1 VoiceXML network interactions.
Following is a description of the elements involved in the network diagram:
Caller Telephone. The telephone that the caller uses to access a VoiceXML application. The figure illustrates a call over the PSTN.
VoiceXML Gateway. A gateway that bridges the PSTN and IP worlds and hosts the VoiceXML browser, speech hardware and software, and dialed number-to-URL mapping.
Web Server. This is the server hosting a VoiceXML application in this network. By editing the MIME types supported by an HTTP server, VoiceXML can be delivered from any web server.
Here's a description of the networks involved in this network diagram:
PSTN. Public Switched Telephone Network, also known as Plain Old Telephone Service (POTS). This is the telephone service most of us have in our homes, and it carries our speech and DTMF interactions, such as prompts played by the VoiceXML gateway and responses that the caller speaks.
Internet. The Internet, for which we pay an Internet service provider (ISP) to provide access. It carries the gateway's request to the web server for VoiceXML and returns it to the gateway.
Note that this network has a fairly traditional architecture that doesn't capture scenarios involving Voice over Internet Protocol (VoIP).
To summarize important points about the network, the voice web consists of the PSTN, VoiceXML applications on the Internet, and a VoiceXML gateway between the Internet and the PSTN. The VoiceXML gateway hosts specialized hardware and software that enable voice browsing.