Introduction to Voice XML Part 5: Voice XML Meets Web 2.0
This is the last article in our series on Voice XML, so let’s wrap things up by expanding your thinking about how you can use voice as an enabler for your applications. When we look at the classic success stories of Voice XML (for example, the American Airlines flight service or 1.800.DOMINOS), we see Voice XML as a front end to large database-driven applications where voice input acts as a convenient front end similar to the role of the classic HTML form. Data is collected from a user and delivered to a server, where some back end database processing occurs. The resultant information is then delivered back to the voice client end user. This is what I call the classic voice application.
But let’s expand our thinking a bit in light of the ubiquity of cell phones and the emergence of Web 2.0, where applications are assembled by connecting software components in new and often exciting ways. The latest term for this is mashup, a word intended to reflect the kind of guerilla assembly process that’s driving the creation of a new generation of web apps that are built around web services.
So where does Voice XML fit in? The answer is that it fits wherever you need a voice component to drive or augment your application. The ease of setting up free developer accounts with Voice XML providers such as Voxeo enables you to begin experimenting with voice for your own apps, hopefully leading to the next great mashup idea.
To give you some food for thought, let’s look at how we can use JavaScript to increase the intelligence of our voice apps and then explore how a little XML data on a server can go a long way toward helping generate dynamic Voice XML content for you or your users.
JavaScript and Voice XML
JavaScript has been getting a lot of attention lately in the Web 2.0 zone as the key ingredient for doing client-side AJAX. The good news is that much of that JavaScript expertise can be leveraged in your Voice XML apps. Technically, we’ll be looking at ECMAScript, the international "JavaScript" standard that has been adopted as scripting language for Voice XML.
One of the benefits of ECMAScript is that you can access Voice XML variables within ECMAScript. Elements that accept the expr attribute can use arbitrary ECMAScript code to generate a value at runtime. And you can abstract your commonly used ECMAScript functions into functions or libraries to support reuse in your Voice XML pages.
Some key things to note about JavaScript include the following:
- Voice XML variables are equivalent to ECMAScript variables. Voice XML variables can be passed to JavaScript functions. Values returned from functions can be stored in Voice XML variables.
- The expr attribute available with many tags can refer not only to Voice XML or ECMAScript variables but also can include ECMAScript function call expressions.
- ECMAScript can be placed inline in the Voice XML document using the <script> element, or scripts can be loaded from a URI.
- ECMAScript functions follow the familiar scope hierarchy: application -> document -> dialog.
Let’s begin by writing a simple JavaScript function called multiply that returns the product of two numbers. Imagine that you’re driving along the highway, wearing your cell phone headset, and you need to do a quick calculation. You trigger a call to your voice application and the following dialogue ensues:
C: What’s number one? U: 2 2 3 C: What’s number two? U: 4 2 3 4 C: The product of 223 and 4234 is 944,182
This bit of mobile math is accomplished by using Voice XML’s <script> element to enclose a JavaScript function that performs the calculation:
<script> <![CDATA[ function multiply(v1, v2) { return v1 * v2; } ]]> </script>
As you can see, the JavaScript part of the code is quite simple. It’s embedded inside a CDATA block so we can use characters that might upset an XML parser. For example, if we want to use a less-than sign (<) as part of our script, a parser will get confused thinking that we are starting off a new element and return a nasty parsing error. By enclosing our scripts in CDATA, we’re free to use JavaScript constructs without fear of parser retaliation.