XML Reference Guide

Mar 14, 2003

␡

⎙ Print

< Back Page 33 of 278 Next >

One of the worst inconveniences in working with the Document Object Model comes when you want to select a single node or group of nodes from within the document based on their name or value. A quick, desperate look at the documentation shows you that there is simply no easy way to do it.

That's where the DOM Level 3 XPath specification comes in. Finally, there's an easy way to select a node or set of nodes without having to rig up an XSLT transformation.

It works like this: you create an XPathExpression, and then you use it's evaluate() method to produce an XPathResult. The XPathResult is an object that can be cast (or as the spec puts it, "coerced") into various types, depending on the data. For example, a single value might be a string or a number. An XPathResult might also represent a group of nodes, in which case, you can either retrieve an individual snapshotItem or iterate through all of them.

Let's see it in action.

Now, before we start, you will, of course, need an implementation of DOM Level 3 XPath. To make things simple, let's use a browser-based version. Mozilla reportedly supports XPath out of the box, though at the time of this writing I had trouble getting it to work in Firefox. (Note: The author's computer has been compromised by his teenage son, so your mileage may vary.) At any rate, you can also download an implementation of DOM Level 3 XPath in JavaScript -- yes, JavaScript -- as created by Dimitri Glazkov, that will duplicate Mozilla's support in Microsoft Internet Explorer.

Download the file and unzip it, then put the dom-xpath.js file in the same directory in which you'll save your *.html file. Now you're ready to start.

Create the new HTML file and add the following code:

<html>
<head>
    <title>XPath demonstration</title>
    <script language="javascript" type="text/javascript" src="dom-xpath.js"></script>
</head>
<body>

<p>Please visit the following sites:</p>

<a href="http://www.informit.com">InformIT.com</a><br />
<a href="http://www.chaosmagnet.com">Chaos Magnet</a><br />
<a href="http://www.vanguardreport.com">The Vanguard Science Fiction Report</a><br />


</body>
</html>

This creates a simple page like Figure 1:

Now, the DOM Level 3 XPath documentation specifies that the evaluate() method we're going to use is part of the XPathExpression interface. In the version we're using, it's the document object that implements that interface, so we can create an XPathResult as follows:

...
<a href="http://www.vanguardreport.com">The Vanguard Science Fiction Report</a><br />

<hr />

<script type="text/javascript">

  var result = document.evaluate("count(//a)", 
                                 document, 
                                 null, 
                                 XPathResult.NUMBER_TYPE, 
                                 null);

</script>

</body>
</html>

Let's look at the arguments for this method. The first argument, count(//a) is simply an XPath expression that should return the number of a elements in the document. It could just as easily have been /html/body/div[@class='special'] or //form/input.

The second argument is the context node for the expression. This is the node designated as "here" for any relative expressions. I could have specified the body as the context node and used count(a) for the XPath expression.

The third argument is null here, because we don't have any namespaces to worry about. If we did, we could use this argument to specify a XPathNSResolver object.

The fourth argument enables us to specify the type as which we want our data returned. Depending on the actual values of the content -- you can't turn "bogus" into a number, after all -- you can specify that the data should be returned as a number, string, boolean, iterator, snapshot, or node type. You can also specify XPathResult.ANY_TYPE to indicate that you'll deal with the issue later.

The final argument represents the object that will ultimately hold the result. Because I'm not specifying an object here, the method will simply return the result, as expected.

OK, so we've created the XPathResult, now how do we get at it? Simple. We use the getXXX() methods, like so:

...
<script type="text/javascript">

  var result = document.evaluate("count(//a)", 
                                 document, 
                                 null, 
                                 XPathResult.NUMBER_TYPE, 
                                 null);

    document.write("There were ");
    document.write(result.getNumberValue());
    document.write(" links on this page.");

</script>
...

In this case, we're looking for a number, so I used the getNumberValue() method to get the numberValue attribute. (Of course, in this case, I wasn't actually doing any calculations, so I could just as easily have used a string.) You can also get the stringValue, booleanValue, and singleNodeValue values. This last enables you to treat the result as a typical DOM node, requesting attribute or child nodes just as you would in any other DOM application.

The result looks like Figure 2:

As I mentioned earlier, DOM Level 3 XPath also enables you to select groups of nodes and iterate through them. For example, I could loop through each of those a elements:

...
    document.write("There were ");
    document.write(result.getNumberValue());
    document.write(" links on this page.");

    document.write("<br /><br />");
    document.write("They were: <br />");

    var iterator = document.evaluate("//a", 
                                     document, 
                                     null, 
                                     XPathResult.UNORDERED_NODE_ITERATOR_TYPE, 
                                     null);
    var item;
    var outputString = "";
    while(item = iterator.iterateNext()) {
         outputString = outputString + item.firstChild.nodeValue;
         outputString = outputString + " at ";
         outputString = outputString + item.getAttribute('href')
         outputString = outputString + "<br />";
    }
    document.write(outputString);

</script>

</body>
</html>

Here I'm first creating the iterator, just as I originally created the result, except in this case I'm specifying that I want to get back a node iterator rather than a single value type. From there, I'm iterating through each item. For each item, I'm first retrieving the text child of the element, and then the value of the href attribute. In both cases, I'm dealing with the item as a DOM node, independent of how I've created it. The results are shown in Figure 3.

Pretty simple, eh? Makes you wonder why they didn't just include this back in DOM Level 1, or at least Level 2. No matter, it's here now. Unfortunately, support is still limited, but it's working it's way into Xerces, and MSXML has a similar feature, SelectSingleNode. Hopefully in the near future you'll be able to use it in your projects instead of kludging together other solutions using XSLT or other techniques.

< Back Page 33 of 278 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address