- Simple API For XML Version 2 (SAX2)
- Auxiliary SAX Interfaces
- SAX and I/O
- SAX Error Handling
- The Glue of SAX: XMLReader
- The Document Object Model
- The Object Model
- The DOM and Factories
- The Node Interface
- Parents and Children
- Nonhierarchical Nodes
- Text Nodes
- Element and Attribute Nodes
- Document, Document Type, and Entity Nodes
- Bulk Insertion Using Document Fragment
- DOM Error Handling
- Implementation vs Interface
- DOM Traversal
- Where Are We?
The Object Model
The DOM is a projection of the XML Infoset. The object model of the DOM represents the Infoset as a tree-structured graph of nodes. The DOM specifies several aspects of this graph, including the interfaces that must be supported by each node, the syntax/semantics of the each node interface, and the relationships between the different node types. The DOM does not, however, mandate how the underlying code is structured or what algorithms or data structures are used to maintain the internal form of the underlying information items.
Figure 2.4 shows the UML model of the DOM. The focal point of the DOM is the Node interface, which acts as the base interface for all node types. Table 2.3 shows the various node types and their corresponding Infoset information item where applicable. The fact that virtually everything is a node makes traversal code extremely uniform, as a standard set of methods is available no matter where one is in the object model. However, each node supports an extended interface type that exposes information item-specific functionality in a type-safe manner interface type that exposes information item-specific functionality in a type-safe manner.
Figure 2.4. DOM Interfaces
Table 2.3. DOM Nodes and the Infoset
DOM Node | Infoset Information Item |
Document | Document Information Item |
DocumentFragment | N/A |
DocumentType | Document Type Declaration Information Item |
EntityReference | Entity Start/End Marker Information Items |
Element | Element Information Item |
Attr | Attribute Information Item |
ProcessingInstruction | Processing Instruction Information Item |
Comment | Comment Information Item |
Text | Sequence of Character Information Items |
CDATASection | CDATA Start/End Marker Information Items |
Entity | Entity Information Item |
Notation | Notation Information Item |
To see how the DOM object model reflects the Infoset, consider the following serialized XML document:
<?xml version="1.0"?> <?order alpha ascending?> <art xmlns=http://www.art.org/schemas/art'> <period name="Renaissance"> <artist>Leonardo da Vinci</artist> <artist>Michelangelo</artist> <artist>Donatello</artist> </period> <!-- insert period here --> </art>
Figure 2.5 shows what happens when this XML document is projected onto the DOM. Notice that the topmost node in the DOM structure corresponds to the document information item and is of type Document. The Document node has two child nodes that correspond to the document information item's [children] property: a ProcessingInstruction node7 and an Element node. The Element node is the distinguished document element and has two child nodes corresponding to the element information item's [children] property: one Element node and one Comment node. That Element node has three Element nodes as children, again corresponding to the [children] Infoset property.
Figure 2.5. The DOM and art
As just described, there is a striking similarity between the node-based model of the DOM and the Infoset. Where the DOM's node-based model departs from the Infoset is in its treatment of character information items. The Infoset treats each character in element content as a distinct information item. This is reasonable for an abstract model, but the performance impact of using an object per character would render the DOM completely unusable. For that reason, the DOM aggregates adjacent character information items into a single node of type Text. It is also interesting to note that the nodeValue property of the parent Element nodes is always null. Rather, to access the character data [children] from an Element node, one must first access the Text node that is the child of the element. The nodeValue of that node will contain a string of characters reflecting the element content.