- Simple API For XML Version 2 (SAX2)
- Auxiliary SAX Interfaces
- SAX and I/O
- SAX Error Handling
- The Glue of SAX: XMLReader
- The Document Object Model
- The Object Model
- The DOM and Factories
- The Node Interface
- Parents and Children
- Nonhierarchical Nodes
- Text Nodes
- Element and Attribute Nodes
- Document, Document Type, and Entity Nodes
- Bulk Insertion Using Document Fragment
- DOM Error Handling
- Implementation vs Interface
- DOM Traversal
- Where Are We?
Implementation vs Interface
The discussion so far has focused on DOM Level 2 core functionality. The DOM working group defines a base set of functionality that all implementations must fulfill. The DOM working group has also defined interfaces that model peripheral functionality that implementations can elect to implement or not implement as they see fit. Rather than use the well-known component development techniques11 for interface discovery, the DOM working group instead relies on a hard-coded method (DOMImplementation.hasFeature) to determine whether a given implementation supports some feature. DOMImplementation.hasFeature allows you to query for specific versions of DOM feature support. The following Java code tests if the DOM implementation supports the DOM Level 2 XML Core functionality and drops back to using DOM Level 1 XML Core features otherwise:
if (domimp.hasFeature("XML", "2.0")) { // supports DOM Level 2 XML Core } else if (domimp.hasFeature("XML", "1.0")) { // otherwise give up or revert to DOM Level 1 XML Core }
DOM Level 2 adds several chapters to the specification that formalize additional (optional) DOM features such as CSS, Events, Range, Stylesheets, Traversal, and Views. To test whether a DOM implementation supports one of these features, use the appropriate feature string along with a version string of "2.0". See Table 2.5 for a list of all the possible DOM features and available feature versions at publication time.
The DOM is notorious for lacking explicit mechanisms for translating between serialized XML documents and DOM hierarchies. While future versions of the DOM may address this need, each XML parser must now define its own proprietary interface for performing I/O operations. For example, the Apache Software Foundation's Xerces-J parser uses a SAX-based parser to load an XML document into the DOM structure as shown here.
try { org.apache.xerces.parsers.DOMParser parser = new org.apache.xerces.parsers.DOMParser(); parser.parse("http://www.develop.com/book.xml">); org.w3c.dom.Document doc = parser.getDocument(); // use DOM Document here... } catch(SAXException e) { } catch(IOException e) { }
Table 2.5. DOM Implementation Features
DOM Feature | Feature Name | Known Versions |
XML | "XML" | "1.0", "2.0" |
HTML | "HTML" | "1.0", "2.0" |
CSS | "CSS" | "2.0" |
CSS Extended Interfaces | "CSS2" | "2.0" |
Events | "Events" | "2.0" |
User Interface Events | "UIEvents" | "2.0" |
Mouse Events | "MouseEvents" | "2.0" |
Mutation Events | "MutationEvents" | "2.0" |
HTML Events | "HTMLEvents" | "2.0" |
Range | "Range" | "2.0" |
StyleSheets | "StyleSheets" | "2.0" |
Traversal | "Traversal" | "2.0" |
Views | "Views" | "2.0" |
Xerces-J also provides a few implementation-specific features to tweak the DOM behavior. The http://apache.org/xml/features/dom/defer-node-expansion feature makes it possible to defer node expansion until the DOM tree is traversed. The http://apache.org/xml/features/dom/create-entity-ref-nodes feature controls whether or not entity reference nodes show up in the DOM tree. Aside from these implementation-specific features and the proprietary serialization mechanism described above, Xerces-J completely adheres to the DOM Level 2 Core API.
Another commonly used XML processor is Microsoft's MSXML 3.0. MSXML 3.0 is a COM-based implementation of the DOM Level 2 feature set. MSXML's COM language binding prefixes all DOM interface names with IXMLDOM. For example, the Node interface in MSXML is called IXMLDOMNode. The ECMAScript bindings are mostly compatible with the DOM Level 2 specification at the time of this writing. In terms of serialization, MSXML adds load, loadXML, and save methods to the Document interface. The loadXML method expects the serialized XML stream to be passed as a literal string. The load and save methods expect either a system identifier or an IStream interface to the actual data. The following ECMAScript demonstrates the MSXML parser:
var doc = new ActiveXObject("msxml.domdocument") doc.async = false doc.validateOnParse = false if (doc.load(0"http://www.awl.com/book.xml")) myHandler(doc, doc.documentElement); else error(doc.parseError.reason); end if // load another document doc.loadXML("<book><authors></authors></book>");
This example also illustrates several other MSXML-specific extensions. The async property is used to control asynchronous loading of the XML document. If async is not explicitly set to false, the Document will load asynchronously.12 The validateOnParse property controls document validation against a DTD or XML Data Reduced schema definition. Finally, the parseError property makes it easy to figure out what went wrong if loading fails. MSXML provides several other proprietary extensions that are commonly used by programmers today, including xml, text, async, readyState, preserveWhiteSpace, nodeTypedValue, and nodeTypeString properties.