- Simple API For XML Version 2 (SAX2)
- Auxiliary SAX Interfaces
- SAX and I/O
- SAX Error Handling
- The Glue of SAX: XMLReader
- The Document Object Model
- The Object Model
- The DOM and Factories
- The Node Interface
- Parents and Children
- Nonhierarchical Nodes
- Text Nodes
- Element and Attribute Nodes
- Document, Document Type, and Entity Nodes
- Bulk Insertion Using Document Fragment
- DOM Error Handling
- Implementation vs Interface
- DOM Traversal
- Where Are We?
Parents and Children
The node relationships of the DOM are a direct manifestation of the [parent]/[children] relationships of the Infoset. The Node interface provides a set of attributes and methods that correspond to these two Infoset properties.
interface Node { readonly attribute Node parentNode; readonly attribute Node firstChild; readonly attribute Node lastChild; readonly attribute Node previousSibling; readonly attribute Node nextSibling; readonly attribute NodeList childNodes; boolean hasChildNodes(); readonly attribute Document ownerDocument; : : : };
Figure 2.7 illustrates how these attributes relate to a given node in a document. Due to the Node interface's generality, some aspects of this interface may not be applicable for all node types. For example, the document information item does not have a [parent] property. For that reason, the Node.parentNode attribute will always evaluate to null for nodes of type Document. Similarly, since processing instruction information items do not have a [children] property, the Node.firstChild and Node.lastChild attribute of ProcessingInstruction nodes will always evaluate to null.
Figure 2.7. The Node interface's [parent]/[children] properties
The Node interface was designed to make document traversal uniform and simple. For example, the following Java method performs a depth-first traversal of the given node:
void traverseTree (Node current) { myProcessNode(current); for (Node child = current.getFirstChild(); child != null; child = child.getNextSibling() ) traverseTree(child); }
This depth-first traversal corresponds exactly to the document-order of [children] properties in the Infoset. The following Java method performs the traversal in reverse-document-order:
void traverseTreeReverse(Node current) { for (Node child = current.getFirstChild(); child != null; child = child.getPreviousSibling() ) traverseTreeReverse(child); myProcessNode(current); }
Note that both of these examples use a sequential access pattern much like that imposed by a linked-list. It is also possible to use a more array-like random-access pattern using the NodeList interface.
The DOM defines the NodeList interface to provide random access to an ordered collection of nodes. The NodeList interface is defined as follows:
interface NodeList { Node item(in unsigned long index); readonly attribute unsigned long length; };
The [children] property of the current node can be accessed using the NodeList interface via the Node.childNodes attribute. The traverseTree method shown earlier can be rewritten as follows:
void traverseTree(Node current) { myprocessNode(current); NodeList children = current.getChildNodes (); for (int i = 0; i < children.getI.ength(); i++) traverseTree(children.item(i)); }
The traverseTreeReverse can also be rewritten using the Nodelist interface
void traverseTreeReverse(Node current) { NodeList children = current.getChildNodes(); for (int i = children.getLength() - 1; i >= 0; i--) traverseTreeReverse(children.item(i)); myProcessNode(current); }
For these two examples, the difference is largely stylistic. However, if random access to the ith child node is desired, then the NodeList approach will likely be considerably faster.9
Independent of the [parent]/[children] relationship, DOM nodes always belong to a particular Document. As soon as a new node is created through one of the Document's factory methods, that node is automatically associated with the Document that created it. This association cannot be changed and is made explicit via the node's ownerDocument attribute.
void create(org.w3c.dom.Document doc) { Node n = doc.createElementNS ("http://books.org", "author"); assert (n.ownerDocument == doc); }
Note that the value of the Node.ownerDocument attribute cannot be changed and will be the same for all nodes within a document.
The Node interface also provides a set of methods for manipulating the DOM hierarchy. These methods enforce the type constraints of the [children] properties. For example, one can use these methods to add a Text node child to an Element node but not to a Document node. The definition of these manipulation methods is as follows:
interface Node { Node insertBefore(in Node newChild, in Node refChild) Node appendChild(in Node newChild) Node removeChild(in Node oldChild) Node replaceChild(in Node newChild, in Node oldChild) Node cloneNode(in boolean deep); ... };
Both the Node.insertBefore and Node.appendChild methods are used to add nodes to the hierarchy.
Node.appendChild simply adds the new child to the end of the list of children, whereas Node.insertBefore allows you to specify where the new node should appear in the sequence of children. Both methods return the node that was inserted. For example, the following code adds a new artist element node to the end of period's children:
import org.w3c.dom.*; void addNewArtist(Document doc, Node period) { Node newChild = doc.createElementNS( "http://www.art.org/schemas/art", "artist"); Node insertedNode = period.appendChild(newChild); assert(newChild == insertedNode); assert(insertedNode == period.getLastChild()); }
The Node.insertBefore accepts an additional object reference as a parameter. This reference indicates the node that the insertion should precede. If this parameter is null, Node.insertBefore behaves exactly like Node.appendChild. Consider the following code that uses both insertion techniques:
import org.w3c.dom.*; void addTwoPeriods(Document doc, Node art) { Node newChild1 = doc.createElementNS( "http://www.art.org/schemas/art", "period"); art.appendChild(newChild1); Node newChild2 = doc.createElementNS( "http://www.art.org/schemas/art", "period"); Node pos = period.getFirstChild().getNextSibling(); Node insertedNode = art.insertBefore(newChild2, pos); assert(newChild2 == insertedNode); assert(insertedNode == period.getChildNodes().item(1)); }
Figure 2.8 shows the results of this code.
Figure 2.8. Node.appendChild and Node.insertBefore
Whereas the DOM's Node.nodeType reinvents RTTI to distinguish node types, the DOM leverages the native programming system's notion of object identity. To this end, a given instance of Node can only appear once in a document hierarchy. When calling Node.insertBefore or Node.appendChild, if the node to be inserted already appears elsewhere in the DOM hierarchy, it will automatically be removed from its current location before being inserted to a new location. For example, consider the following Java code:
import org.w3c.dom.*; void doubleIt(Document doc, Node parent) { Node child = doc.createElementNS("", "child"); parent.appendChild(child); parent.insertBefore(child, parent.getFirstChild()); }
Because a given node object can only appear once in a document hierarchy, this code winds up inserting exactly one Element node at the beginning of the list of child nodes of parent. The work done by the call to Node.appendChild is undone by the call to Node.insertBefore that "reinserts" the same node.
In addition to inserting nodes into a list of child nodes, it is is also possible to remove or replace a node via Node.removeChild and Node.replaceChild methods respectively. Consider the following java code:
import org.w3c.dom.*; void figureNine(Document doc, Node art) { Node newChild = doc.createElementNS( "http://www.art.org/schemas/art", "period"); art.replaceChild(newChild, art.getFirstChild()); art.removeChild(art.getLastChild()); }
The results of this code are shown in Figure 2.9. Note that the resources held by the node that is replaced or removed are not necessarily destroyed at removal-time.
Figure 2.9. Node.replaceChild and Node.removeChild
It is interesting to note that the Node.replaceChild method simply combines insertion and removal in a single operation. For example, this Java method
import org.w3c.dom.*; void swap(Node parent, Node oldChild, Node newChild) { parent.insertBefore(newChild, oldChild); parent.removeChild(oldChild); }
could be replaced with
import org.w3c.dom.*; void swap(Node parent, Node oldChild, Node newChild) { parent.replaceChild(newChild, oldChild); }
The latter is obviously more convenient and maintainable.
Finally, the Node.cloneNode method provides a way to copy a node and optionally all of its ancestor nodes. The newly created clone node is not attached to any document hierarchy and does not have a parent until it is inserted using one of the methods described above. The new node will be a an exact copy of original node, plus if the deep parameter is true, Node.cloneNode will recursively copy all child nodes as well. If the node happens to be an Element node, Node.cloneNode will also copy all of the attribute information items associated with the element. The following example copies a period element and all of its children and attributes and then inserts it into the hiearchy:
import org.w3c.dom.*; void cloneLastPeriod(Node art) { Node clone = art.getLastChild().cloneNode(true); art.appendChild(clone); }
As shown in Figure 2.10, this simple method call can do a lot of work.
Figure 2.10. Node.cloneNode
The various parent/child attributes of the Node interface are considered live in the face of updates. Additionally, the object returned by childNodes doesn't represent a static snapshot of the child nodes but rather a dynamic cursor over the current collection of children. Any changes to the underlying content is automatically visible in all outstanding NodeList references as shown here.
NodeList nl = period.getChildNodes(); // nl.getLength() == 3 Node newNode = doc.createElement("author"); period.appendChild(newNode); // nl.getLength() == 4
Notice that after inserting a new author node under period, the changes are automatically visible to the outstanding NodeList reference, nl, without any additional intervention.
Because NodeLists are live, care needs to be taken when dealing with them in certain situations. For example, take a look at the following code that attempts to remove all nodes from a NodeList:
NodeList nl = node.getChildNodes(); for (int i = 0; i < nl.getLength(); i++) { node.removeChild(nl.item(i)); }
This code doesn't work as expected because the NodeList is live. If there were 5 nodes in the list, it's only going to remove nodes 1, 3, and 5. Every time a node is removed, the NodeList's length changes along with each node index. To handle this type of operation properly, it's better to delete from the front of the list until the list is empty.
NodeList nl = node.getChildNodes(); while (nl.getLength() > 0) { node.removeChild(nl.item(0)); }
Of course, this code could have also been written using Node.firstChild.
The parentNode value is also dynamic in the face of node manipulation. For example, upon creation via createElementNS, a new element's parentNode attribute will be null. However, once it has been inserted into a document's hierarchy, the element's parentNode attribute will be adjusted accordingly.
Node n = doc.createElementNS("http://books.org", "author"); // n.parentNode == null doc.appendChild(n); // insert the node // n.parentNode == doc
All nodes except for Document, DocumentFragment, Attr, Entity, and Notation nodes will have a non-null parent node when they exist within the hierarchy.