DOM and Java
DOM is not limited to browsers. Nor is it limited to JavaScript. DOM is a multiplatform, multilanguage API.
DOM and IDL
There are versions of DOM for JavaScript, Java, and C++. In fact, there are versions of DOM for most languages because the W3C adopted a clever trick: It specified DOM using the OMG IDL.
OMG IDL is a specification language for object interfaces. It is used to describe not what an object does but which methods and which properties it has. IDL, which stands for Interface Definition Language, was published by the OMG, the Object Management Group (http://www.omg.org).
The good thing about IDL is that it has been mapped to many object- oriented programming languages. There are mappings of IDL for Java, C++, Smalltalk, Ada, and even COBOL. By writing the DOM recommendation in IDL, the W3C benefits from this cross-language support. Essentially, DOM is available in all of these languages.
CAUTION
The fact that DOM is specified in IDL does not mean that parsers must be implemented as CORBA objects. In fact, to the best of my knowledge, there are no XML parsers implemented as CORBA objects. The W3C used only the multilanguage aspect of IDL and left out all the distribution aspects.
Java and JavaScript are privileged languages for XML development. Most XML tools are written in Java and have a Java version. Indeed, there are probably more Java parsers than parsers written in all other languages. Most of these parsers support the DOM interface.
» If you would like to learn how to write Java software for XML, read Appendix A, "Crash Course on Java," (page 443).
A Java Version of the DOM Application
Listing 7.7 is the conversion utility in Java. As you can see, it uses the same objects as the JavaScript listing. The objects have the same properties and methods. That's because it's the same DOM underneath.
Listing 7.7: Conversion.java
package com.psol.xbe2; import java.io.*; import org.w3c.dom.*; import org.xml.sax.*; import javax.xml.parsers.*; import org.apache.xerces.parsers.*; public class Conversion { public static void main(String[] args) throws Exception { if(args.length < 2) { System.out.print("java com.psol.xbe2.Conversion"); System.out.println(" filename rate"); return; } double rate = Double.parseDouble(args[1]); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); factory.setValidating(false); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(new File(args[0])); Conversion conversion = new Conversion(document,rate); } public Conversion(Document document,double rate) { searchPrice(document.getDocumentElement(),rate); } protected void searchPrice(Node node,double rate) { if(node.getNodeType() == Node.ELEMENT_NODE) { Element element = (Element)node; if(element.getLocalName().equals("product") && element.getNamespaceURI().equals( "http://www.psol.com/xbe2/listing7.1")) { NamedNodeMap atts = element.getAttributes(); Attr att = (Attr)atts.getNamedItemNS(null,"price"); double price = att != null ? Double.parseDouble(att.getValue()) : 0; System.out.print(getText(node) + ": "); System.out.println(price * rate); } NodeList children = node.getChildNodes(); for(int i = 0;i < children.getLength();i++) searchPrice(children.item(i),rate); } } protected String getText(Node node) { StringBuffer buffer = new StringBuffer(); NodeList children = node.getChildNodes(); for(int i = 0;i < children.getLength();i++) { Text text = (Text)children.item(i); buffer.append(text.getData()); } return buffer.toString(); } }
Three Major Differences
The major difference between the Java and the JavaScript versions is that Java properties have the form getPropertyName().
Therefore, the following JavaScript code from Listing 7.3
if(node.localName == "product" && node.namespaceURI== "http://www.psol.com/xbe2/listing7.1")
is slightly different in Java:
if(element.getLocalName().equals("product") && element.getNamespaceURI().equals( "http://www.psol.com/xbe2/listing7.1"))
The second difference is that Java is a strongly typed language. Typecasting between Node and Node descendants is very frequent, such as in the getText() method. In JavaScript, the typecasting was implicit:
protected String getText(Node node) { StringBuffer buffer = new StringBuffer(); NodeList children = node.getChildNodes(); for(int i = 0;i < children.getLength();i++) { // typecast from Node to Text Text text = (Text)children.item(i); buffer.append(text.getData()); } return buffer.toString(); }
The third difference is in how you start the parser. Although DOM does not define standard methods to load documents, Sun does. The application uses Sun-defined DocumentBuilderFactory and DocumentBuilder to load a document.
Loading a document is a two-step process. First, create a new DocumentBuilderFactory through the newInstance() method. Set various propertiesin this case, to enable namespace processing and select a nonvalidating parser.
Next, use the newDocumentBuilder() method to acquire a DocumentBuilder object. Call its parse() method with a File object. parse() returns a Document object and you're back in DOM land:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); factory.setValidating(false); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(new File(args[0]));
The Parser
Listing 7.7 was written using the Xerces parser for Java available from xml.apache.org. Xerces is a popular open-source parser. It was originally developed by IBM and is now supported by the Apache Foundation. Xerces is a very useful tool because it supports both DOM and SAX (the event-based interface).
If you download the listings from http://www.marchal.com, it includes a copy of Xerces in the file xerces.jar.
Other Java parsers are available from Oracle (otn.oracle.com/tech/xml), as well as James Clark (http://www.jclark.com).