- Apache Background
- Java Xerces on Your Computer
- "Hello Apache"
- Critical Xerces Packages
- Xerces Java DOM In-depth
- Java Xerces SAX In-depth
17.5 -Xerces Java DOM In-depth
This section plunges into the Java DOM, using Xerces to construct a couple of larger applications. It also presents the critical packages and classes you will most likely use while building XML DOM applications. Included are APIs for the three most important DOM interfaces, along with sample code that explores many of their interesting methods.
17.5.1 The Document Interface
Xerces2 implements the DOM Level 2 API, which builds on the original Level 1 core. As you know, DOM is most useful when a new XML document must be created from scratch or when a parsed document must be saved in memory, presumably to be manipulated at a later time. The common Xerces DOM API elements are found in the package org.w3.dom and its subpackages.
The org.w3.dom package (whose interfaces are described in the previous section) is the starting point from which to construct Java DOM code. Look through Table 17.4, which describes the package org.w3c.dom, and see if you can locate which three interfaces you think are most important to your task of building DOM applications.
As you saw in Listing 17.2, an object that implements the Document interface is returned from the DocumentBuilder object via a call to either the newDocument or parse method. Once you have an object of type Document, you may call any of the methods in Table 17.11 to create actual XML elements (the create* methods) or access elements (the get* methods).
TABLE 17.11 The org.w3c.dom.Document Interface
Method Name (parameters) |
Return Value |
Explanation |
Attr |
createAttribute(String name) |
'-Attr' is short for 'Attribute'here you can create an Attr with a name of your choice and then set the value of the Attr with a call to setValue (String). |
Attr |
-createAttributeNS(String namespaceURI, String qualifiedName) |
-If you want to qualify the Attribute with a particular namespace URI, use this method. |
CDATASection |
createCDATASection(String data) |
-Create a CDATASection and set the value of its data by using this method. |
Comment |
-createComment(String data) |
-Create a Comment and set the value of its data by using this method. |
DocumentFragment |
createDocumentFragment() |
-A DocumentFragment object is a lightweight version of an XML document. If you want to move portions of a document around, you can use a DocumentFragment object to designate this purpose instead of using just a general Node object. |
Element |
createElement(String tagName) |
-Here is how we commonly create XML document elements, which you can manipulate with one of many method calls. (Remember, Element extends the node interface, so all these methods are available.) |
Element |
-createElementNS(String namespaceURI, String qualifiedName) |
-If you want to qualify the Element's type name with a particular namespace URI, use this method. |
EntityReference |
-createEntityReference(String name) |
-An EntityReference object points to another Entity somewhere in the XML document. Think of it as a shortcut. |
ProcessingInstruction |
-createProcessingInstruction(String target, String data) |
-Processing instructions are parser-specific commands (in XML documents, they are wrapped between '<?' and '?>'). Create them and set their target and data here. |
Text |
createTextNode(String data) |
-An Element can have one Text node. Create and set data here. |
DocumentType |
getDoctype() |
-This returns the Document Type Declaration associated with this document. |
Element |
-getDocumentElement() |
-This returns the root Element of the document, which you can then access. |
Element |
-getElementById(String elementId) |
-This returns an Element based on its ID (ID is a specific XML type). |
NodeList |
-getElementsByTagName(String tagname) |
-All Elements with a given tag name are returned in a NodeList, in the order in which they are encountered in a pre-order traversal of the Document tree. |
NodeList |
-getElementsByTagNameNS(String namespaceURI, String localName) |
-Same as getElementsByTagName, except only Elements with a matching namespace are returned in the NodeList. |
DOMImplementation |
-getImplementation() |
-Returns the DOMImplementation object that handles this document. Useful for testing whether the DOM implementation supports certain features through the returned interface's hasFeature (String feature, String version) method. |
Node |
-importNode(Node importedNode, boolean deep) |
-Imports a Node from another document to this document. The source node is not altered, and the node returned from this call has no parent node. |
Plus all methods from org.w3c.dom.Node. |
The last line in Table 17.11 is important. The Document interface also extends a lower interface: Node. As you have probably guessed, Node is the second of the three critical org.w3c.dom interfaces. Before you get overwhelmed with the APIs, however, practice with a little bit of code by creating a more interesting XML document, with many of the features described in the Document interface in Table 17.11.
17.5.2 Creating DOM Documents
In this section, you create your first XML elements. You do this by using a Document object that is returned from the DocumentBuilder (just like HelloApache2). Once you have the Document object, you can use that object to create most XML nodes, such as Text, CDATA, and of course Elements themselves. Every XML node has a corresponding method in the Document object. For example, to create a Comment, you call createComment, with the parameter being the actual comment text. (There is a caveat, with Attributes. See Listing 17.3, and we will explain afterward.)
When instantiated, all XML components must be attached to one another; because they are separate objects, they must be able to refer to one another. For instance, once you have created an Element and then a Text node for that Element, you must link them together. You do this by using the appendChild method, available in each object. Eventually the root Element itself must be attached to something: the Document. Then you can serialize out the Document object, as in Listing 17.2.
To add a twist, the code also specifies a namespace for this Element; you will see the effects after you compile and run the example. Also, instead of using a StringWriter, as in HelloApache2, you use a FileOutputStream. This lets you persist the finalized DOM document out to an XML file, which you can then access and transmit as you prefer (see the XMLSerializer class API in the official documentation to discover all the output options you have available).
LISTING 17.3 HelloApacheDOM Example
import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Attr; import org.w3c.dom.Comment; import org.w3c.dom.CDATASection; import org.w3c.dom.Text; import org.apache.xml.serialize.OutputFormat; import org.apache.xml.serialize.Serializer; import org.apache.xml.serialize.XMLSerializer; import java.io.FileOutputStream; public class HelloApacheDOM { public static void main (String[] args) { try { javax.xml.parsers.DocumentBuilderFactory dbf = javax.xml.parsers.DocumentBuilderFactory.newInstance(); javax.xml.parsers.DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.newDocument(); // Create the parent Element object, and add a Comment Element root = doc.createElementNS ("http://www.galtenberg.net", "books:BOOK"); Comment comment = doc.createComment ("Publisher 'Beacon Press' address is unknown"); root.appendChild(comment); // Create a child Element with its own Text Element item = doc.createElementNS("http://www.galtenberg.net", "books:AUTHOR"); Text text = doc.createTextNode("Bachelard.Gaston"); item.appendChild(text); root.appendChild(item); // Do the same as above, but this time add Attributes item = doc.createElementNS("http://www.galtenberg.net", "books:TITLE"); text = doc.createTextNode("The Poetics of Reverie"); Attr attrib = doc.createAttributeNS ("http://www.galtenberg.net", "books:ISBN"); attrib.setValue("0-8070-6413-0"); // Attributes are attached differently item.setAttributeNodeNS(attrib); item.appendChild(text); root.appendChild(item); item = doc.createElementNS("http://www.galtenberg.net", "books:TRANSLATOR"); attrib = doc.createAttributeNS("http://www.galtenberg.net" ,"books:ORIGINAL-LANGUAGE"); attrib.setValue("French"); item.setAttributeNodeNS(attrib); item.appendChild(doc.createTextNode("Daniel Russell")); root.appendChild(item); item = doc.createElementNS("http://www.galtenberg.net", "books:EXCERPT"); CDATASection cdata = doc.createCDATASection ("One does not dream with taught ideas."); item.appendChild(cdata); root.appendChild(item); doc.appendChild(root); OutputFormat format = new OutputFormat (doc); FileOutputStream fs = new FileOutputStream ("d:\\helloapache.xml"); XMLSerializer serial = new XMLSerializer (fs, format); serial.serialize(doc); } catch (Exception e) { e.printStackTrace(); } } }
When you get into this example, you will see that the fundamentals of building XML DOM applications are pretty straightforward. Decide which XML node you would like to utilize, import that interface, declare a type, use the Document object to create the node, fill in its data, and attach it to the appropriate Element, like this:
Element root = doc.createElementNS ("http://www.galtenberg.net", "books:BOOK"); Comment comment = doc.createComment ("Publisher 'Beacon Press' address is unknown"); root.appendChild(comment);
How else could you have created this Element (maybe without the namespace)? Look back to Table 17.11, which describes the Document interface. Yes, you could call doc.createElement("BOOK");.
Now compile and execute your sample. Out on your D: drive (make sure to change the code if this is not what you want), you now see the file helloapache.xml. When you open it in a Web browser, it should look like Figure 17.7.
FIGURE 17.7 Viewing output of HelloApacheDOM example in Web browser.
17.5.3 -The Element Interface
As mentioned earlier, there is one caveat in the HelloApacheDOM example, which is a good lesson if you are new to the XML APIs. When you created an XML attribute (using the Attr interface), you did the normal things: You asked the Document to return you an Attr object and then you filled in the attribute's value. But unlike with other XML components, you did not call appendChild on an Element as follows:
Element item = doc.createElementNS("http://www.galtenberg.net", "books:TRANSLATOR"); Attr attrib = doc.createAttributeNS("http://www.galtenberg.net", "books:ORIGINAL-LANGUAGE"); attrib.setValue("French"); item.setAttributeNodeNS(attrib);
Instead of calling appendChild on item, you called setAttributeNodeNS with your Attr object. Elements have their own interface, which deals primarily with XML attributes. Table 17.12 contains org.w3c.dom.Element, which is the other critical interface with which you should become acquainted.
TABLE 17.12 The org.w3c.dom.Element Interface
Return Value |
Method Name (parameters) |
String |
getAttribute(String name) |
Attr |
getAttributeNode(String name) |
Attr |
-getAttributeNodeNS(String namespaceURI, String localName) |
String |
getAttributeNS(String namespaceURI, String localName) |
NodeList |
getElementsByTagName(String name) |
NodeList |
-getElementsByTagNameNS(String namespaceURI, String localName) |
String |
getTagName() |
boolean |
hasAttribute(String name) |
boolean |
hasAttributeNS(String namespaceURI, String localName) |
void |
removeAttribute(String name) |
Attr |
removeAttributeNode(Attr oldAttr) |
void |
-removeAttributeNS(String namespaceURI, String localName) |
void |
setAttribute(String name, String value) |
Attr |
setAttributeNode(Attr newAttr) |
Attr |
setAttributeNodeNS(Attr newAttr) |
void |
-setAttributeNS(String namespaceURI, String qualifiedName, String value) |
Plus all methods from org.w3c.dom.Node |
We omitted the explanation for the methods in Table 17.12 because the naming and parameter patterns are pretty basic. If you want to explore these methods in detail, see the official API documentation.
17.5.4 -The Node Interface
That leaves one more critical DOM interface to master. Nearly all the XML types returned from the methods in Document (Attr, CDATASection, CharacterData, Comment, DocumentFragment, DocumentType, Element, Entity, EntityReference, ProcessingInstruction, Text) extend the interface node. You already know the first method well. The rest of the interface methods (from the API documentation) are listed in Table 17.13.
TABLE 17.13 The org.w3c.dom.Node Interface
Method Name (parameters) |
Return Value |
Explanation |
Node |
-appendChild(Node newChild) |
-Adds the node newChild to the end of the list of children of this node. |
Node |
-cloneNode(boolean deep) |
-Returns a duplicate of this node; that is, serves as a generic copy constructor for nodes. |
NamedNodeMap |
-getAttributes() |
-A NamedNodeMap containing the attributes of this node (if an Element), or null otherwise. |
NodeList |
-getChildNodes() |
A NodeList that contains all children of this node. |
Node |
-getFirstChild() |
The first child of this node. |
Node |
-getLastChild() |
The last child of this node. |
String |
-getLocalName() |
-Returns the local part of the qualified name of this node. |
String |
-getNamespaceURI() |
-The namespace URI of this node, or null if unspecified. |
Node |
-getNextSibling() |
The node immediately following this node. |
String |
-getNodeName() |
The name of this node, depending on its type. |
short |
-getNodeType() |
-A code representing the type of the underlying object. (See the official API documentation for a listing of the potential types.) |
String |
-getNodeValue() |
The value of this node, depending on its type. |
Document |
-getOwnerDocument() |
The Document object associated with this node. |
Node |
-getParentNode() |
The parent of this node. |
String |
-getPrefix() |
The namespace prefix of this node, or null if unspecified. |
Node |
-getPreviousSibling() |
The node immediately preceding this node. |
boolean |
-hasAttributes() |
Returns whether this node (if an element) has any attributes. |
boolean |
-hasChildNodes() |
Returns whether this node has any children. |
Node |
-insertBefore(Node newChild, Node refChild) |
-Inserts the node newChild before the existing child node refChild. |
boolean |
-isSupported(String feature, String version) |
-Tests whether the DOM implementation implements a specific feature and that feature is supported by this node. |
void |
-normalize() |
-Puts all text nodes in the full depth of the subtree underneath this node, including attribute nodes, into a "normal" form where only structure (for example, elements, comments, processing instructions, CDATA sections, and entity references) separates text nodes. That is, there are neither adjacent text nodes nor empty text nodes. |
Node |
-removeChild(Node oldChild) |
-Removes the child node indicated by oldChild from the list of children and returns it. |
Node |
-replaceChild(Node newChild, Node oldChild) |
-Replaces the child node oldChild with newChild in the list of children and returns the oldChild node. |
void |
setNodeValue(String nodeValue) |
void |
setPrefix(String prefix) |
Note how the Node interface only deals with "its own kind" (other Nodes and Node-utility classes). Because it is the base XML DOM interface, this makes sense. All of the methods listed in Table 17.13 are available in the XML node classes, so we highly recommend that you get comfortable with Node.
17.5.5 -An Advanced DOM Example
There is one more DOM example in this chapter to demonstrate the accessing, parsing, and traversing of an XML document. But this time, you generate your own custom report.
In this example, instead of outputting everything you find in the document, you sort through the Document object to find the components that interest you. To do this, you use the NamedNodeMap and NodeList utility classes.
Think of these as their java.util counterparts. Maps, which are hashtables, have a key and value (use the key to retrieve the value). Lists are just linked lists of objects (traverse them until there are no more objects). The NamedNodeMap and NodeList interfaces are quite simple; see the API documentation for the org.w3c.dom package.
One other thing to remember: The specific XML objects such as those of type Text and Comments and CDATA were attached as children to their parent Element. They must be accessed in the same way. The NamedNodeMap and NodeList hold Elements; you must go one level deeper to access your data. Also, you can check the type of the Node with a call to getNodeType (a list of the possible types can be found in the Node interface API) to confirm that the child you have accessed is the correct one.
Listing 17.4 shows one possible implementation of a program that generates a report using the XML data created from Listing 17.3.
LISTING 17.4 HelloApacheDOM2 Example
import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.w3c.dom.NamedNodeMap; public class HelloApacheDOM2 { public static void main (String[] args) { try { javax.xml.parsers.DocumentBuilderFactory dbf = javax.xml.parsers.DocumentBuilderFactory.newInstance(); javax.xml.parsers.DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(args[0]); // Display the root Element Element root = doc.getDocumentElement(); System.out.println("\nDocument Element: Name = " + root.getNodeName() +", Value = "+ root.getNodeValue()); // Traverse through list of the root Element's Attributes NamedNodeMap nnm = root.getAttributes(); System.out.println("# of Attributes: " + nnm.getLength()); for (int x = 0; x < nnm.getLength(); x++) { Node n = nnm.item(0); System.out.println("Attribute: Name = " + n.getNodeName() + ", Value = " + n.getNodeValue()); } // Retrieve author name (Text is a child!) NodeList elementList = root.getElementsByTagName ("books:AUTHOR"); String authorName = elementList.item(0).getFirstChild().getNodeValue(); // Do the same for the title elementList = root.getElementsByTagName("books:TITLE"); String bookName = elementList.item(0).getFirstChild().getNodeValue(); // Pull the quote out elementList = doc.getElementsByTagName("books:EXCERPT"); for (int x = 0; x < elementList.getLength(); x++) { // This node is books:EXCERPT Node node = elementList.item(x); // Access CDATA underneath (remember, it's a child) Node childNode = node.getFirstChild(); if (childNode.getNodeType() != Node.CDATA_SECTION_NODE) throw new Exception ("This element is not CDATA!"); System.out.println("\nBook excerpt:"); String value = childNode.getNodeValue(); System.out.println(value); System.out.println(" " +authorName+ ", " + bookName); } } catch (Exception e) { e.printStackTrace(); } } }
Compiling and executing this code should result in the information shown in Figure 17.8 (make sure to add the path to the XML file as an argument).
FIGURE 17.8 Viewing output of HelloApacheDOM2 example.
TABLE 17.14 Java Packages for Advanced DOM Functionality
Advanced DOM Package |
Functionality |
org.w3c.dom.events |
-Contains five interfaces (DocumentEvent, Event, EventListener, EventTarget, MutationEvent) to provide a generic event system. Useful for defining specific parsing and traversal functionality. |
org.w3c.dom.html |
-Contains dozens of interfaces that let you build an HTML document just as you would an XML document; this functionality not only supports the DOM Level 0 specification but may also simplify common and frequent HTML operations. |
org.w3.dom.ranges |
-Contains two interfaces (DocumentRange, Range) that let you identify and manipulate a range of document content. |
org.w3.dom.traversal |
-Contains four interfaces (DocumentTraversal, NodeFilter, NodeIterator, TreeWalker) that let you dynamically identify, traverse, and filter a selected range of document content. |
17.5.6 -DOM Helpers and DOM Level 3
Before moving on from DOM, know that there are also subpackages that provide advanced DOM Level 2 functionality. Take a look at the packages listed in Table 17.14 when you are comfortable with the previous samples and interfaces. Also, if you want to explore DOM Level 3 functionality, see the org.apache.xerces.dom3 package (and its subpackages) for the latest interfaces and behaviors. Note that this is a parser-specific package whose APIs might change, but if you require the abstract schema and load-and-save features described in the DOM Level 3 working drafts, at least you have this foothold. Make sure to keep current with new Xerces updates. (There are two Xerces discussion groups you can join to receive the latest news, bug reports, fixes, and releases. Click Mailing Lists on the main panel at http://xml.apache.org for subscription instructions.)