- Defining the Document Object Model
- DOM Core Level I
- Creating Document Objects
- Node Interface
- NodeList and NamedNodeMap
- Document Interface
- Element Interface
- Attr Interface
- Additional Interfaces
- Creating DOM Elements
- DOM Level II
- The DOM Core Defined
- Implementation Anomalies
- Summary
- Suggested for Further Study
- Further Reading
DOM Core Level I
The DOM is broken into two parts. The first is the DOM Core Level I. As the name implies, the DOM Core Level I is a core or basic set of interfaces and objects that are required to provide a complete platform upon which other features can be added or layered. Only those features needed to manipulate XML documents at their most basic level are required. Support for additional functionality, such as Cascading Style Sheets (CSS) and events, is not part of the DOM Core.
Documents, Elements, and Nodes
The DOM Core Level I covers three main areasDocuments, Elements, and Nodes. Each of these three interfaces represents objects at a different level in the XML hierarchy. Figure 3.1 shows graphically the parent/child relationship between these DOM objects.
Figure 3.1. Parent/child relationships in the DOM.
Note
In addition to XML, the DOM also supports modeling HTML 4.0 and greater documents. While our primary focus is handling XML, much of the DOM Core functionality applies equally well to HTML with the exception of those areas that specifically target XML such as entities, notations, and the like.
Note
Java developers will find an interesting twist in DOM API, specifically that everything is a node. In reality, what this means is that the DOM API is really two separate APIsone hierarchical and one flat. A developer could use nothing but the methods provided on the Node object and never, well, almost never, use any other interface. This flat model is designed to eliminate the need for cast operations that are typically costly in languages such as Java. For those more inclined to use traditional inheritance-based hierarchies, the DOM provides one of those as well. Developers can use either API or intermix the two as needs dictate.
Table 3.1 shows the complete node types and their underlying values as defined by org.w3c.dom.Node.
Table Node Types
Constant | Value |
ELEMENT_NODE | 1 |
ATTRIBUTE_NODE | 2 |
TEXT_NODE | 3 |
CDATA_SECTION_NODE | 4 |
ENTITY_REFERENCE_NODE | 5 |
ENTITY_NODE | 6 |
PROCESSING_INSTRUCTION_NODE | 7 |
COMMENT_NODE | 8 |
DOCUMENT_NODE | 9 |
DOCUMENT_TYPE_NODE | 10 |
DOCUMENT_FRAGMENT_NODE | 11 |
NOTATION_NODE | 12 |
If we examine a tree, we see that a typical tree has a root, a number of branches, and sub-branches, with each ending in a leaf (except the root is normally drawn at the top!). A XML Document or its DOM representation is no different with the Document interface representing the root, elements representing the branches and sub-branches, and the Nodes representing the leaves.
Listing 3.1 shows the DTD of a simple catalog representing books that might be used by a publisher. The catalog has a header (0 or 1), a trailer (0 or 1), and any number of entries (0 or more). The catalog element is the root, with catheader, cattrailer, and some number of elements representing entries in the catalog. Figure 3.2 shows the logical tree that represents a catalog with only a header entry. In reality, things are slightly more complicated than Figure 3.2 suggests. Conceptually, there is a Document object, which represents the root, and its one child node, representing the catheader element, which is represented by a leaf node. In actuality, the header node also has a childa text nodeand that text node has a number of children, one for each character. From a programming standpoint, the text node is a node, and the header node is an Element.
Figure 3.2. Catalog with only a header entry.
Listing 3.1 DTD of Book Catalog Markup Language
1: <!-- Book Catalog Markup Language Document Type Definition --> 2: <!ELEMENT catalog (catheader,entry*,cattrailer)> 3: <!ELEMENT catheader (#PCDATA)> 4: <!ELEMENT cattrailer (#PCDATA)> 5: <!ELEMENT entry (title, author+, publisher,price+, isbn)> 6: <!ELEMENT title (#PCDATA)> 7: <!ELEMENT author (#PCDATA)> 8: <!ELEMENT publisher (#PCDATA)> 9: <!ELEMENT price (#PCDATA)> 10: <!ATTLIST price 11: cur CDATA #REQUIRED 12: discount (retail|wholesale|other) "retail"> 13: <!ELEMENT isbn (#PCDATA)> 14: <!ENTITY AuthorName "Albert J. Saganich Jr"> 15: <!ENTITY PublisherInfo "MCP">
As we add information to our XML document, the DOM representation changes accordingly. Adding a cattrailer item results in a tree similar to Figure 3.3.
Figure 3.3. Catalog with header and trailer.
If we continue to add elements to our catalog, it eventually results in a tree that looks similar to Figure 3.4. Note that we have also shown all the unexpected children of our leaf nodes for completeness. As we previously mentioned, one would expect that each leaf, for example the cattrailer element, would be represented as the type text. The leaf would then have an appropriate name and contain the value specified in the XML document. However, this is not the case. Each element results in an accompanying element in the DOM tree with the underlying data represented as a child item. If we examine this for a moment, it makes perfect sense. The element itself is not the data but rather the description of the data. Elements often have other information as well, such as attributes, all of which helps us to understand why the DOM tree is as it is.
Figure 3.4. Catalog with header, trailer, and entry elements.