- Simple API For XML Version 2 (SAX2)
- Auxiliary SAX Interfaces
- SAX and I/O
- SAX Error Handling
- The Glue of SAX: XMLReader
- The Document Object Model
- The Object Model
- The DOM and Factories
- The Node Interface
- Parents and Children
- Nonhierarchical Nodes
- Text Nodes
- Element and Attribute Nodes
- Document, Document Type, and Entity Nodes
- Bulk Insertion Using Document Fragment
- DOM Error Handling
- Implementation vs Interface
- DOM Traversal
- Where Are We?
The Node Interface
The primary purpose of the Node interface is to define the base functionality for all node types. It defines the set of attributes, methods, and constants that must be available on any node within the DOM hierarchy. This makes it possible to traverse a DOM hierarchy in a uniform fashion strictly using the Node interface without having to downcast to more specific interface types. The more specific interfaces are available when necessary to access features that only make sense for a given node type.
One of the tradeoffs of a fairly general base interface like Node is that some of its operations do not apply to all of the node types. The DOM working group made a conscious decision to not factor the Node interface to the smallest common subset. Rather, the DOM working group tried to balance the benefits of type safety with the convenience and uniformity of a single model-wide interface. This compromising of type-safety means that some errors may not be detected until runtime rather than at compile-time.
For generality, the DOM does not rely on the use of runtime type identification (RTTI) in the target programming language. Rather, the nodeType attribute is used to test a node for compatibility with a derived interface (for example, Element, Document). The Node interface also defines a group of symbolic constants that correspond to the different node types.
interface Node { readonly attribute unsigned short nodeType; const unsigned short ELEMENT_NODE = 1; const unsigned short ATTRIBUTE_NODE = 2; const unsigned short TEXT_NODE = 3; const unsigned short CDATA_SECTION_NODE = 4; const unsigned short ENTITY_REFERENCE_NODE = 5; const unsigned short ENTITY_NODE = 6; const unsigned short PROCESSING_INSTRUCTION_NODE = 7; const unsigned short COMMENT_NODE = 8; const unsigned short DOCUMENT_NODE = 9; const unsigned short DOCUMENT_TYPE_NODE = 10; const unsigned short DOCUMENT_FRAGMENT_NODE = 11; const unsigned short NOTATION_NODE = 12; : : : };
The nodeType attribute acts as a poor man's RTTI mehanism. For example, the Java function
boolean isElement(org.w3c.dom.Node n) { return n.getNodeType() == org.w3c.dom.Node.ELEMENT_NODE; }
is functionally equivalent to
boolean isElement(org.w3c.dom.Node n) { return n instanceof org.w3c.dom.Element; }
The advantage of the latter is that it is integrated into the type system of the language. The advantage of that it works consistently even in typeless languages (such as ECMAScript) where RTTI is impractical. Additionally, the former approach provides implementations with more flexibility with respect to factoring which concrete classes implement which interfaces, and for that reason one should always use the former style of test to ensure portability.
Many node types support names. These names are exposed via the Node.nodeName property, and for node types such as ProcessingInstruction or Entity, this is sufficient. However, because both element and attribute names can be affiliated with a namespace URI, the Node interface also contains attributes for retrieving namespace information.
interface node { readonly attribute DOMString nodeName; readonly attribute DOMString namespaceURI; attribute DOMString prefix; readonly attribute DOMString localName; : : : : }
The Node.nodeName attribute always returns the QName based on the qualified name of the element or attribute information item. More importantly, the Node interface makes the [namespace URI] and [local name] Infoset properties available via the Node.namespaceURI and Node.localName attributes. For convenience, the actual prefix used in the QName is available via the Node.prefix attribute. Since only element and attribute names may be affiliated with a namespace, these namespace-specific attributes return null for all other node types.
There are several node types that don't have an obvious name (for example, Document, Text, or Comment nodes). The DOM specification defines fixed values that must be used for the Node.nodeName of these node types. For example, the Node.nodeName attribute for Document nodes always evaluates to "#document". For text nodes the attribute must evaluate to "#text", for comment nodes "#comment," and so on.
A node's value is accessed generically via the Node.nodeValue attribute. Certain node types can have values, whereas others cannot. For example, Element nodes only have children and their nodeValue is always null. In contrast, for Text nodes, the nodeValue attribute evaluates to the character data content of the text node. Table 2.4 lists the nodeName/nodeValue values for each of the possible node types. Figure 2.6 shows the nodeName and nodeValue values for each node in the DOM hierarchy shown earlier in this section.
Figure 2.6. Node types, names, and values
Both nodeType and nodeName are read-only properties. Because of this, there is no way to change a node's name once it has been created. While this does simplify DOM implementations, there are still situations where this type of functionality is necessary (such as with an XML editor). If you need to change a node's name on the fly, you're required to create a completely new node and copy the node's value as appropriate.
Table 2.4. DOM Node Types, Names and Values
NodeType | # | nodeName | nodeValue |
Element | 1 | Tag name | Null |
Attr | 2 | Name of attribute | value of attribute |
Text | 3 | #text | Content of the text node |
CDATASection | 4 | #cdata-section | Content of the CDATA section |
EntityReference | 5 | Name of entity referenced | Null |
Entity | 6 | Entity name | Null |
ProcessingInstruction | 7 | Target | Entire content excluding the target |
Comment | 8 | #comment | Content of the comment |
Document | 9 | #document | Null |
DocumentType | 10 | Document type name | Null |
DocumentFragment | 11 | #document-fragment | Null |
Notation | 12 | Notation name | Null |