Getting Started with DOM
Let's see, through examples, how to use a DOM parser. DOM is implemented in Web browsers so these examples run in a browser. At the time of this writing, Internet Explorer and Netscape have differences in how they implement DOM.
This is partly due to the fact that DOM level 2 does not specify how to load XML documents! Loading of documents is planned for DOM level 3 only. Therefore, to run these examples, make sure you use the correct browser, as indicated before the listing.
A DOM Application
Listing 7.2 is an HTML page for a JavaScript application to convert prices from U.S. dollars to euros. The price list is an XML document. The application demonstrates how to use DOM.
A slightly modified version of this page (essentially, better looking) could be used on an electronic shop. International shoppers could access product prices in their local currency.
CAUTION
By default, Internet Explorer supports a draft version of DOM, not the official recommendation.
Most differences are for namespace-related properties or functions. For example, DOM defines a property called localName but Internet Explorer implements baseName.
You can upgrade Internet Explorer to full standard conformance by downloading the latest Microsoft XML parser from msdn.microsoft.com/xml. If doing so, you would need to adapt the listings in this chapter.
Listing 7.2: conversion-ie5.html
<html> <head> <title>Currency Conversion</title> <script language="JavaScript"> function convert(form,document) { var output = form.output, rate = form.rate.value, root = document.documentElement; output.value = ""; searchPrice(root,output,rate); } function searchPrice(node,output,rate) { if(node.nodeType == 1) { // with DOM Level 2, it would be localName if(node.baseName == "product" && node.namespaceURI== "http://www.psol.com/xbe2/listing7.1") { // with DOM Level 2, it would be getAttributeNS() var price = node.attributes.getQualifiedItem("price",""); output.value += getText(node) + ": "; output.value += (price.value * rate) + "\r"; } var children, i; children = node.childNodes; for(i = 0;i < children.length;i++) searchPrice(children.item(i),output,rate); } } function getText(node) { var children = node.childNodes, text = ""; for(i = 0;i < children.length;i++) { var n = children.item(i); if(n.nodeType == 3) text += n.data; } return text; } </script> </head> <body> <center> <form id="controls"> Rate: <input type="text" name="rate" value="1.0622" size="4"><br> <input type="button" value="Convert" onclick="convert(controls,products)"> <input type="button" value="Clear" onclick="output.value=''"><br> <!-- make sure there is one character in the text area --> <textarea name="output" rows="10" cols="50" readonly> </textarea> </form> </center> <xml id="products"> <xbe:products xmlns:xbe="http://www.psol.com/xbe2/listing7.1"> <xbe:product price="499.00">XML Editor</xbe:product> <xbe:product price="199.00">DTD Editor</xbe:product> <xbe:product price="29.99">XML Book</xbe:product> <xbe:product price="699.00">XML Training</xbe:product> </xbe:products> </xml> </body> </html>
This page contains the XML document, the conversion routine in JavaScript, as well as an HTML form. Figure 7.6 shows the result in the browser.
Figure 7.6: Running the script in a browser.The page defines a form with one field for the exchange rate (you can find the current exchange rate on any financial Web site):
Rate: <input type="text" name="rate" value="1.0622" size="4"><br>
It also defines a read-only text area that serves as output:
<textarea name="output" rows="10" cols="50" readonly> </textarea>
Finally, it defines an XML island. XML islands is a proprietary extension to HTML from Microsoft to insert XML documents within HTML documents. In this case, XML islands are used to access Internet Explorer's XML parser. The price list is loaded into the island:
<xml id="products"> <xbe:products xmlns:xbe="http://www.psol.com/xbe2/listing7.1"> <xbe:product price="499.00">XML Editor</xbe:product> <xbe:product price="199.00">DTD Editor</xbe:product> <xbe:product price="29.99">XML Book</xbe:product> <xbe:product price="699.00">XML Training</xbe:product> </xbe:products> </xml>
NOTE
XML island is specific to Internet Explorer. It will not work with another browser. You will see why you have to use browser-specific code in a moment.
The "Convert" button in the HTML file calls the JavaScript function convert(), which is the conversion routine. convert() accepts two parametersthe form and the XML island:
<input type="button" value="Convert" onclick="convert(controls,products)">
The script retrieves the exchange rate from the form. It walks through the document (see the next section for the details). It communicates with the XML parser through the XML island.
DOM Node
Remember that DOM defines a set of objects that the parser uses to represent an XML document. Because XML documents are hierarchical, it stands to reason that DOM defines objects to build hierarchies or, as they are known to programmers, trees.
The core object in DOM is the Node. Nodes are generic objects in the tree and most DOM objects are derived from nodes. There are specialized versions of nodes for elements, attributes, entities, text, and so on.
Node defines several properties to help you walk through the tree. The following are the most important ones:
nodeType is a code representing the type of the object. The list of codes is shown in Table 7.1.
parentNode is the parent (if any) of current Node object.
childNodes is the list of children for the current Node object. childNodes is of type NodeList.
firstChild is the Node's first child.
lastChild is the Node's last child.
previousSibling is the Node immediately preceding the current one.
nextSibling is the Node immediately following the current one.
attributes is the list of attributes, if the current Node has any. The property is a NamedNodeMap.
In addition, Node defines four properties to manipulate the underlying object:
nodeName is the name of the Node (for an element, it's the tag name). The nodeName includes the namespace prefix, if any (for example, xbe:product).
-
localName/baseName is the local part of namethat is, the name without the namespace prefix such as product. The official DOM recommendation specifies localName but, by default, Internet Explorer uses baseName. You can upgrade Internet Explorer to full DOM support by downloading the latest parser from msdn.microsoft.com/xml.
namespaceURI is, as the name implies, the namespace's URI (for example, http://www.psol.com/xbe2/listing7.1).
prefix is the namespace prefix (for example, xbe).
nodeValue is the value of the Node (for a text node, it's the text).
» DOM also defines functions that will be introduced in Chapter 9, "Writing XML."
Table 7.1: nodeType Code
Type |
Code |
Element |
1 |
Attribute |
2 |
Text |
3 |
CDATA section |
4 |
Entity reference |
5 |
Entity |
6 |
Processing instruction |
7 |
Comment |
8 |
Document |
9 |
Document type |
10 |
Document fragment |
11 |
Notation |
12 |
In the example, the function searchPrice() tests whether the current node is an element:
if(node.nodeType == 1) { // with DOM Level 2, it would be node.localName if(node.baseName == "product" && node.namespaceURI== "http://www.psol.com/xbe2/listing7.1") { // with DOM Level 2, it would be node.getAttributeNodeNS() var price = node.attributes.getQualifiedItem("price",""); output.value += getText(node) + ": "; output.value += (price.value * rate) + "\r"; } var children, i; children = node.childNodes; for(i = 0;i < children.length;i++) searchPrice(children.item(i),output,rate); }
NodeList
NodeList is a DOM object that contains a list of Node objects. It has only two properties:
length, the number of nodes in the list.
item(i), a method to access node i in the list.
NamedNodeMap
The searchPrice() function also accesses the price attribute. Element objects expose their attributes in the attributes property. The attributes property is a NamedNodeMap object.
A NamedNodeMap is a list of nodes with a name attached to them. It supports the same properties and methods as NodeListlength and item(i)but, it also has special methods to access nodes by name:
getNamedItem()/getNamedItemNS()/getQualifiedItem()returns the node with the given name. getNamedItem() uses the element's tag, getNamedItemNS() uses its namespace URI and local name. getQualifiedItem() is the older form of getNamedItemNS(); it is not part of the official DOM standard, but it is the method required for Internet Explorer.
setNamedItem()/setNamedItemNS()sets the node with the given name. As seen previously, setNamedItemNS() uses the namespace.
removeNamedItem()/removeNamedItemNS()removes the node with the given name.
searchPrice() illustrates how to use attributes to retrieve the price attribute.
// with DOM Level 2, it would be node.getAttributeNodeNS() var price = node.attributes.getQualifiedItem("price","");
CAUTION
Bear in mind that, by default, Internet Explorer 5.0 and 5.5 recognize an obsolete version of DOM. However, if you have upgraded them to the latest Microsoft parser, you will need to use the official DOM constructs.
Document Object
The topmost element in a DOM tree is Document. Document inherits from Node so it can be inserted in a tree. Document inherits most properties from Node and adds only three new properties:
documentElement is the root from the document.
implementation is a special object that defines methods that work without a document (such as createDocument(), which creates new documents).
doctype is the Document Type Definition.
Note that the Document object sits one step before the root element. Indeed the root element is in the documentElement property.
To return a tree, the parser returns a Document object. From the Document object, it is possible to access the complete document tree.
CAUTION
Unfortunately, the DOM recommendation starts with the Document object, not with the parser itself. DOM level 3 defines new methods to load and save XML documents but, for the time being, one must use proprietary solutions such as Microsoft's XML island.
Element Object
Element is the descendant of Node that is used specifically to represent XML elements. In addition to the properties inherited from Node, Element defines the tagName property for its tag name.
Obviously, since Element is a Node, it inherits the attributes property from Node. The property is a NamedNodeMap with a list of Attr elements.
Element also defines methods to extract information (there are more methods to create documents and they are introduced in Chapter 9, "Writing XML"):
getElementsByTagName()/getElementsByTagNamesNS() return a NodeList of all descendants of the element with a given tag name. The first method works with the element's tag, (xbe:product) whereas the second expects a namespace URI and local name (http://www.psol.com/xbe2/listing7.1 and product).
getAttributeNode() and getAttributeNodeNS() return an attribute from its tag or the combination of its namespace URI and local name, respectively.
Attr
Attr objects represent the attributes. Attr is a Node descendant. In addition to the properties it inherits from Node, Attr defines the following properties:
name is the name of the attribute.
value is the value of the attribute.
ownerElement is the Element this attribute is attached to.
specified is true if the attribute was given a value in the document; it is false if the attribute has taken a default value from the DTD.
TIP
The W3C decided to call the attribute object Attr to avoid confusion with object properties. In some languages, object properties are called object attributes. An Attribute object would have been very confusing.
Text Object
As the name implies, Text objects represent text such as the textual content of an element.
In the listing, the function uses getText() to return the textual content of a node. For safety, the function iterates over the element's children looking for Text objects:
function getText(node) { var children = node.childNodes, text = ""; for(i = 0;i < children.length;i++) { var n = children.item(i); if(n.nodeType == 3) text += n.data; } return text; }
Why bother iterating over text elements or, if we brush that topic, why bother with Text objects at all? Why can't the parser attach the text directly to the element? The problem is with mixed content where an element contains both text and other elements. The following <p> element contains two text objects and one element object (<img>).
<p>The element can contain text and other elements such as images <img src="logo.gif"/> or other.</p>
The <img> element object splits the text into two text objects:
The text before the <img> element, "The element can contain text and other elements such as images."
The text after, "or other."
Walking the Element Tree
To extract information or otherwise manipulate the document, the application walks the document tree. You have already seen this happening with the XSL processor.
Essentially, the script visits every element in the tree. This is easy with a recursive algorithm. To visit a node:
Do any node-specific processing, such as printing data.
Visit all its children.
Given that children are nodes, to visit them means visiting their children, and the children of their children, and so on.
The function searchPrice() illustrates this process. It visits each node by recursively calling itself for all children of the current node. This is a deep-first searchas you saw with the XSL processor. Figure 7.7 illustrates how it works.
function searchPrice(node,output,rate) { if(node.nodeType == 1) { // with DOM Level 2, it would be localName if(node.baseName == "product" && node.namespaceURI== "http://www.psol.com/xbe2/listing7.1") { // with DOM Level 2, it would be getAttributeNS() var price = node.attributes.getQualifiedItem("price",""); output.value += getText(node) + ": "; output.value += (price.value * rate) + "\r"; } var children, i; children = node.childNodes; for(i = 0;i < children.length;i++) searchPrice(children.item(i),output,rate); } }Figure 7.7: Walking down the tree.
There is a major simplification in searchPrice(): The function only examines nodes of type element (node.nodeType == 1). This is logical given that the function is looking for product elements, so there is no point in examining other types of nodes such as text or entities. As you will see, more complex applications have to examine all the nodes.
At each step, the function tests whether the current node is a product. For each product element, it extracts the price attribute, computes the price in euros, and prints it.
NOTE
For the first time, in this listing, you're seeing how an application processes namespaces. It's not difficult, the application simply compares an element namespace with a predefined value.
In effect, the application tests the element name by comparing both its local name and its namespace:
if(node.baseName == "product" && node.namespaceURI=="http://www.psol.com/xbe2/listing7.1")
Next, the function turns to the node's children. It loops through all the children and recursively calls itself for each child.
To walk through the node's children, the function accesses the childNodes property. childNodes contains a NodeList.
To get started, we just call searchPrice() passing it the root of the XML document, the exchange rate, and text area where it can write the result. The root is accessible through the documentElement property:
var output = form.output, rate = form.rate.value, root = document.documentElement; output.value = ""; searchPrice(root,output,rate);
A More Standard Version
Internet Explorer is close to the DOM standard but not exactly there yet (unless you upgrade to the latest version of the parser or download from msdn.microsoft.com/xml).
As a comparison, we'll build a similar application in Netscape 6 which has more solid support for DOM. Netscape takes a different approach to supporting XML documents. Instead of using proprietary XML island, Netscape loads the XML document directly. To build the user interface, insert HTML elements (or, to be more precise, XHTML elementsHTML elements written with the XML syntax).
This approach has some advantages; most notably it frees us from using proprietary XML island because the whole document is an XML document. On the downside, it makes it difficult to load new XML documents. Furthermore, a bug prevents us from using text input fields. The application would look like Listings 7.3 and 7.4.
Listing 7.3: conversion-ns6.xml
<?xml version="1.0"?> <?xml-stylesheet href="common.css" type="text/css"?> <conversion xmlns:html="http://www.w3.org/1999/xhtml"> <html:script language="JavaScript"><![CDATA[ function convert() { var rate = 1.06224, root = document.documentElement, products = root.getElementsByTagNameNS( "http://www.psol.com/xbe2/listing7.1","products"), outputs = root.getElementsByTagNameNS( "http://www.w3.org/1999/xhtml","pre"); if(outputs.length < 1 || products.length < 1) return; var output = outputs.item(0).firstChild; output.data = ""; searchPrice(products.item(0),output,rate); } function searchPrice(node,output,rate) { if(node.nodeType == 1) { if(node.localName == "product" && node.namespaceURI== "http://www.psol.com/xbe2/listing7.1") { for(j = 0;j < node.attributes.length;j++) var price = node.getAttributeNodeNS("","price").value; output.data += getText(node) + ": "; output.data += (price * rate) + "\n"; } var children, i; children = node.childNodes; for(i = 0;i < children.length;i++) searchPrice(children.item(i),output,rate); } } function getText(node) { var children = node.childNodes, text = ""; for(i = 0;i < children.length;i++) { var n = children.item(i); if(n.nodeType == 3) text += n.data; } return text; } ]]></html:script> <html:center> <!-- make sure there is one character in the text area --> <html:pre> </html:pre> <html:form id="controls"> <html:input type="button" value="Convert" onclick="convert()"/> </html:form> </html:center> <xbe:products xmlns:xbe="http://www.psol.com/xbe2/listing7.1"> <xbe:product price="499.00">XML Editor</xbe:product> <xbe:product price="199.00">DTD Editor</xbe:product> <xbe:product price="29.99">XML Book</xbe:product> <xbe:product price="699.00">XML Training</xbe:product> </xbe:products> </conversion>
Listing 7.4: common.css
product { display: block; text-align: center; }
Figure 7.8 illustrates the result in a browser.
Figure 7.8: Price conversions in Netscape 6.The document's root could be any element because it is not used anywhere else. It declares the xhtml namespace. The document also uses a CSS (from Listing 7.4):
<?xml-stylesheet href="common.css" type="text/css"?> <conversion xmlns:html="http://www.w3.org/1999/xhtml">
Where needed, we can insert XHTML element for the script (html:script) or for a simple HTML form:
<html:center> <!-- make sure there is one character in the text area --> <html:pre> </html:pre> <html:form id="controls"> <html:input type="button" value="Convert" onclick="convert()"/> </html:form> </html:center>
The XML price list is inserted directly in the XML document. It is recognizable through its own namespace:
<xbe:products xmlns:xbe="http://www.psol.com/xbe2/listing7.1"> <xbe:product price="499.00">XML Editor</xbe:product> <xbe:product price="199.00">DTD Editor</xbe:product> <xbe:product price="29.99">XML Book</xbe:product> <xbe:product price="699.00">XML Training</xbe:product> </xbe:products>
Unfortunately, a bug prevents us from using text input fields or text area in XHTML. The exchange rate must be stored as a variable in the script; to update the exchange rate, you need to update the script. A similar problem with text areas means that the script must write its output in a pre-element.
Because the entire document is an XML document, documentElement points to the conversion root. The script uses getElementsByTagNameNS() to retrieve the product list. Notice how the namespace helps pinpoint the right elements. The remainder of the script should be familiar because it calls searchPrice():
var rate = 1.06224, root = document.documentElement, products = root.getElementsByTagNameNS( "http://www.psol.com/xbe2/listing7.1","products"), outputs = root.getElementsByTagNameNS( "http://www.w3.org/1999/xhtml","pre"); if(outputs.length < 1 || products.length < 1) return; var output = outputs.item(0).firstChild; output.data = ""; searchPrice(products.item(0),output,rate);
searchPrice() is nearly identical to Listing 7.2, except that it uses the standard DOM methods such as
if(node.localName == "product" && node.namespaceURI== "http://www.psol.com/xbe2/listing7.1")
or
var price = node.getAttributeNodeNS("","price").value;