- VoiceXML Enables Telephony Applications
- SALT Enables Telephony and Multimodal Applications
- Two Approaches: Which Is Best?
SALT Enables Telephony and Multimodal Applications.
In March 2002, the SALT Forumfounded by Cisco, Comverse, Intel, Microsoft, Philips, and SpeechWorkspublished a specification for speech tags called Speech Application Language Tags (SALT).
The following illustrates a SALT implementation of the same application shown previously. The same color scheme is used as follows, so you can easily compare the SALT implementation with the VoiceXML implementation. With both approaches, the developer specifies prompts (green), grammars (red), and event handlers (blue); and submits results to a back end database (aqua).
<!-- HTML --> <html xmlns:salt="urn:saltforum.org/schemas/020124"> <body onload="RunAsk()"> <form id="travelForm"> <input name="txtBoxOriginCity" type="text" /> <input name="txtBoxDestCity" type="text" /> </form> <!-- Speech Application Language Tags --> <salt:prompt id="askOriginCity"> Where would you like to leave from? </salt:prompt> <salt:prompt id="askDestCity"> Where would you like to go to? </salt:prompt> <salt:prompt id="sayDidntUnderstand" onComplete="runAsk()"> Sorry, I didn't understand. </salt:prompt> <salt:reco id="recoOriginCity" onReco="procOriginCity()" onNoReco="sayDidntUnderstand.Start()"> <salt:grammar src="city.xml" /> </salt:reco> <salt:reco id="recoDestCity" onReco="procDestCity()" onNoReco="sayDidntUnderstand.Start()"> <salt:grammar src="city.xml" /> </salt:reco> <!--- script --> <script> function RunAsk() { if (travelForm.txtBoxOriginCity.value=="") { askOriginCity.Start(); recoOriginCity.Start(); } else if (travelForm.txtBoxDestCity.value=="") { askDestCity.Start(); recoDestCity.Start(); } } function procOriginCity() { travelForm.txtBoxOriginCity.value = recoOriginCity.text; RunAsk(); } function procDestCity() { travelForm.txtBoxDestCity.value = recoDestCity.text; travelForm.submit(); } </script> </body> </html>
Unlike VoiceXML, SALT has no Forms Interpretation Algorithm. Instead, SALT tags must be embedded within a host language such as XHTML, SMIL, or a scripting language. (The preceding example uses XHTML.) Developers must specify all control and coordination functions using the host language. This approach uses a procedural style of programming, as opposed to VoiceXML's declarative style. In effect, developers implement their own version of the Forms Interpretation Algorithm, which is embedded within the code specified by the host language.
SALT can be used to develop telephone applicationsapplications that enable callers to use their existing telephones and cell phones. However, SALT can also be used to develop multimodal applications by embedding SALT tags into languages for building GUIs such as XHTML. SALT tags can be embedded into existing GUI applications, enabling them to speak and listen to callers, as well as display text and graphics while accepting input from a keyboard or keypad, and a mouse or stylus, which results in a multimodal application.
While the VoiceXML browser generally executes on a speech server connected to a telephone, different versions of SALT browsers execute on a server or on a client, such as a handheld computer or an advanced cell phone. SALT Forum members are expected to announce the availability of SALT browsers, development tools, and other products, now that the SALT 1.0 specification has been made public this summer.
The following table summarizes the principles characteristics of both VoiceXML and SALT.
|
VoiceXML |
SALT |
Purpose |
Authoring system-driven and mixed-initiative voice dialogs over telephones and cell phones |
Authoring system-driven, user-driven, and mixed-initiative voice dialogs and multimodal applications |
Target Device |
Interpreted on a speech server for use by telephones and cell phones |
Executed on either a client or server |
Language Style |
Simple, high-level dialog markup language + XCMAScript |
Lightweight markup tag extension of existing (X)HTML, WML, and scripting languages |
Major Language features |
<menu>, <form>, <grammar>, and control flow elements (<if>, <if-else> <next>) |
<listen>, <prompt>, <dtmf>, <smex>, call control and prompt queue objects |
Control Flow |
Implicit by Forms Interpretation Algorithm |
Developer specified using HTML + ECMAScript, WML, SML, and so on |
Number of tags |
Many |
Few |
Grammar Languages |
Speech Recognition Grammar Specification, proprietary languages |
Speech Recognition Grammar Specification, proprietary languages |
Speech Synthesis |
Speech Synthesis ML, proprietary languages |
Speech Synthesis ML, proprietary languages |
Semantic Representation |
Natural Language Semantics ML |
Natural Language Semantics ML |
Call Control |
Call Control XML (CCXML) |
Call Control XML (CCXML), call control object <SMEX> |