- XML Reference Guide
- Overview
- What Is XML?
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Table of Contents
- The Document Object Model
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- DOM and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Implementations
- DOM and JavaScript
- Using a Repeater
- Repeaters and XML
- Repeater Resources
- DOM and .NET
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Downloads
- DOM and C++
- DOM and C++ Resources
- DOM and Perl
- DOM and Perl Resources
- DOM and PHP
- DOM and PHP Resources
- DOM Level 3
- DOM Level 3 Core
- DOM Level 3 Load and Save
- DOM Level 3 XPath
- DOM Level 3 Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- The Simple API for XML (SAX)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SAX and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- SAX and .NET
- Informit Articles and Sample Chapters
- SAX and Perl
- SAX and Perl Resources
- SAX and PHP
- SAX and PHP Resources
- Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Document Type Definitions (DTDs)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Schemas
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- RELAX NG
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Schematron
- Official Documentation and Implementations
- Validation in Applications
- Informit Articles and Sample Chapters
- Books and e-Books
- XSL Transformations (XSLT)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XSLT in Java
- Java in XSLT Resources
- XSLT and RSS in .NET
- XSLT and RSS in .NET Resources
- XSL-FO
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XPath
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Base
- Informit Articles and Sample Chapters
- Official Documentation
- XHTML
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XHTML 2.0
- Documentation
- Cascading Style Sheets
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XUL
- XUL References
- XML Events
- XML Events Resources
- XML Data Binding
- Informit Articles and Sample Chapters
- Books and e-Books
- Specifications
- Implementations
- XML and Databases
- Informit Articles and Sample Chapters
- Books and e-Books
- Online Resources
- Official Documentation
- SQL Server and FOR XML
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- Service Oriented Architecture
- Web Services
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Creating a Perl Web Service Client
- SOAP::Lite
- Amazon Web Services
- Creating the Movable Type Plug-in
- Perl, Amazon, and Movable Type Resources
- Apache Axis2
- REST
- REST Resources
- SOAP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SOAP and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- WSDL
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- UDDI
- UDDI Resources
- XML-RPC
- XML-RPC in PHP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Ajax
- Asynchronous Javascript
- Client-side XSLT
- SAJAX and PHP
- Ajax Resources
- JSON
- Ruby on Rails
- Creating Objects
- Ruby Basics: Arrays and Other Sundry Bits
- Ruby Basics: Iterators and Persistence
- Starting on the Rails
- Rails and Databases
- Rails: Ajax and Partials
- Rails Resources
- Web Services Security
- Web Services Security Resources
- SAML
- Informit Articles and Sample Chapters
- Books and e-Books
- Specification and Implementation
- XML Digital Signatures
- XML Digital Signatures Resources
- XML Key Management Services
- Resources for XML Key Management Services
- Internationalization
- Resources
- Grid Computing
- Grid Resources
- Web Services Resource Framework
- Web Services Resource Framework Resources
- WS-Addressing
- WS-Addressing Resources
- WS-Notifications
- New Languages: XML in Use
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Google Web Toolkit
- GWT Basic Interactivity
- Google Sitemaps
- Google Sitemaps Resources
- Accessibility
- Web Accessibility
- XML Accessibility
- Accessibility Resources
- The Semantic Web
- Defining a New Ontology
- OWL: Web Ontology Language
- Semantic Web Resources
- Google Base
- Microformats
- StructuredBlogging
- Live Clipboard
- WML
- XHTML-MP
- WML Resources
- Google Web Services
- Google Web Services API
- Google Web Services Resources
- The Yahoo! Web Services Interface
- Yahoo! Web Services and PHP
- Yahoo! Web Services Resources
- eBay REST API
- WordML
- WordML Part 2: Lists
- WordML Part 3: Tables
- WordML Resources
- DocBook
- Articles
- Books and e-Books
- Official Documentation and Implementations
- XML Query
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XForms
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Resource Description Framework (RDF)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Topic Maps
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation, Implementations, and Other Resources
- Rich Site Summary (RSS)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Simple Sharing Extensions (SSE)
- Atom
- Podcasting
- Podcasting Resources
- Scalable Vector Graphics (SVG)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- OPML
- OPML Resources
- Summary
- Projects
- JavaScript TimeTracker: JSON and PHP
- The Javascript Timetracker
- Refactoring to Javascript Objects
- Creating the Yahoo! Widget
- Web Mashup
- Google Maps
- Indeed Mashup
- Mashup Part 3: Putting It All Together
- Additional Resources
- Frequently Asked Questions About XML
- What's XML, and why should I use it?
- What's a well-formed document?
- What's the difference between XML and HTML?
- What's the difference between HTML and XHTML?
- Can I use XML in a browser?
- Should I use elements or attributes for my document?
- What's a namespace?
- Where can I get an XML parser?
- What's the difference between a well-formed document and a valid document?
- What's a validating parser?
- Should I use DOM or SAX for my application?
- How can I stop a SAX parser before it has parsed the entire document?
- 2005 Predictions
- 2006 Predictions
- Nick's Book Picks
The Document Object Model Level 3 Core Recommendation provides capabilities that enable you
to more easily work with the data contained in an XML Document
. These new
capabilites fall into several categories, which I'll get into in a moment. They are:
- A standard way to create a
Document
object from scratch - Additional information about documents and individual nodes
- The ability to re-process (or re-validate) a document while it's in use
- Additional node manipulation capabilities
- Easier text manipulation
- Enhanced error management
- The ability to attach non-XML data to a Node
So let's take a look at some of these areas.
Bootstrapping
One of the big aggravations in earlier versions of the Document Object Model
was the fact that it didn't specify how to actually create a Document
object. Sure, you could directly create a Document
from a
DOMImplementation
object, but there wasn't a standard way to create
a DOMImplementation
, so that didn't help very much.
Instad, you had to check the documentation for your parser to get the
implementation-specific way to do it. DOM Level 3 solves the problem by creating
the DOMImplementationRegistry
. The registry includes each of the available
DOMImplementation
classes, each of which has its own set of capabilities.
For example, you may need an implementation that can handle DOM Level 3 Load and Save.
To do that, you can request a list of appropriate implementations from the registry:
import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.DOMImplementationList; import org.w3c.dom.DOMImplementation; import org.w3c.dom.ls.DOMImplementationLS; import org.w3c.dom.Document; public class Level3Dom { public static void main (String args[]){ try { System.setProperty(DOMImplementationRegistry.PROPERTY, "org.apache.xerces.dom.DOMImplementationSourceImpl"); DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementationLS impl = (DOMImplementationLS)registry.getDOMImplementation("LS"); DOMImplementation oldimpl = (DOMImplementation)impl; Document testdoc = oldimpl.createDocument("", "candy", null); } catch (Exception e){ System.out.println(e.toString()); } } }
First, let the system know where to find the class that represents the
overall implementation. From there, create an instance of the registry and
request an implementation that includes the appropriate features, in this case the
LS
(or Load and Save) feature. (You can also request a list of all
appropriate implementations and loop through them, if you like.)
In the Guide entry on Load and Save
I showed you how to use the implementation to create a Level 3-style parser.
Here I'm casting back to a regular DOMImplemenation
to create a
Document
with no namespace information and a root element called
candy
.
Once you've actually created the Document
, the DOM Level 3
provides a wealth of new information about it, including the encoding information
and URI. Level 3 also added new information for attributes, such as the ability
to specify an attribute as an identifier (so you can request its element using
getElementById()
), and additional namespace capabilities.
Re-processing a document
In previous versions of DOM, it was clear how to validate a document when you were
loading it, but once you did that, you could do anything you wanted to the Document
object, whether it was permitted by whatever schema (not the little "s") you were
using. DOM Level 3 let's you "normalize" the Document
after it's been
created. This process does a number of things, from removing empty text nodes and
combining adjacent ones to adjusting white space. You can also, however, use the
DOMConfig
attribute of the Document
to control other aspects
of this processing. For example, you can specify that comments should be removed,
or that the Document
should be revalidated:
import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.DOMImplementationList; import org.w3c.dom.ls.DOMImplementationLS; import org.w3c.dom.ls.LSParser; import org.w3c.dom.*; import org.w3c.dom.DOMConfiguration; public class Level3Dom { public static void main (String args[]){ try { System.setProperty(DOMImplementationRegistry.PROPERTY, "org.apache.xerces.dom.DOMImplementationSourceImpl"); DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementationLS impl = (DOMImplementationLS)registry.getDOMImplementation("LS"); LSParser builder = impl.createLSParser( DOMImplementationLS.MODE_SYNCHRONOUS, null); DOMConfiguration config = builder.getDomConfig(); config.setParameter("validate", Boolean.TRUE); config.setParameter("schema-type", "http://www.w3.org/2001/XMLSchema"); config.setParameter("schema-location", "order.xsd"); Document document = builder.parseURI("order.xml"); Node root = document.getDocumentElement(); root.appendChild(document.createTextNode("bogus data")); System.out.println("Bogus data added."); DOMConfiguration docConfig = document.getDomConfig(); docConfig.setParameter("validate", Boolean.TRUE); docConfig.setParameter("schema-type", "http://www.w3.org/2001/XMLSchema"); docConfig.setParameter("schema-location", "order.xsd"); docConfig.setParameter("comments", Boolean.FALSE); document.normalizeDocument(); } catch (Exception e){ System.out.println(e.toString()); } } }
In this case, I'm parsing the file -- see Load and Save
for more information -- with validation turned on. After loading the file, I've added
bogus data that will make the document invalid. I can then set validation for the
Document
itself (as opposed to the parser) and when I normalize the document,
I'll get an error, as you can see in this output:
Bogus data added. [Error] :-1:-1: cvc-complex-type.2.3: Element 'order' cannot have character [children], because the type's content type is element-only.
Other attributes you can control include the ability to check for well-formedness against a particular version of XML (ie, 1.0 vs. 1.1), the ability to request the canonical form of the document, and namespace and entity information.
Additional Document manipulation capabilities
Ever try to move a Node
from one Document
to another?
The moment you try to append it to the appropriate parent element, you'll get
a "wrong document" error. Finally, DOM Level 3 makes it not only possible, but easy, with
adoptNode
:
import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.DOMImplementationList; import org.w3c.dom.ls.DOMImplementationLS; import org.w3c.dom.ls.LSParser; import org.w3c.dom.*; import org.w3c.dom.DOMConfiguration; public class Level3Dom { public static void main (String args[]){ try { System.setProperty(DOMImplementationRegistry.PROPERTY, "org.apache.xerces.dom.DOMImplementationSourceImpl"); DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementationLS impl = (DOMImplementationLS)registry.getDOMImplementation("LS"); DOMImplementation oldimp = (DOMImplementation)impl; Document testDoc = oldimp.createDocument("", "baseelement", null); Node testroot = testDoc.getDocumentElement(); testroot.appendChild(testDoc.createElement("theChild")); LSParser builder = impl.createLSParser( DOMImplementationLS.MODE_SYNCHRONOUS, null); Document document = builder.parseURI("order.xml"); document.adoptNode(testroot); root.appendChild(testroot); } catch (Exception e){ System.out.println(e.toString()); } } }
Note that the adoptNode()
method doesn't actually add the
Node
to the tree, but rather puts it into a kind of limbo from
which you can pluck it and add it to the tree. (Note also that if you
adopt a Node
that's already part of the Document
in question, you'll be removing
it from that tree. It's still part of the Document
in that
you can add it back in, however.)
DOM Level 3 also provides the ability to compare the position of nodes
with Node.compareDocumentPosition(other)
as well as the ability
to determine whether two nodes are actually the same node or just
equivalent, with Node.isSameNode(other)
and Node.isEqualNode(arg)
.
Improved text handling
DOM Level 3 also takes some of the aggravation out of dealing with text. As you may
know, just because the text version of an XML document shows a "block" of text, say, as
the child of an element, doesn't mean that it's a single text node. It
could, in fact, be multiple text nodes adjacent to each other. (Note that you can
fix this problem with the Document.normalizeDocument()
and Node.normalize()
methods, but that's not the point here.) You might also want all of the text
in a node, even if it's actually content of one or more child elements. You can
accomplish this task with the textContent
property of a Node
.
For example, consider this XML document:
<?xml version="1.0"?> <candy> <product>Mints</product> <product>Chocolate</product> <product>Circus Peanuts</product> </candy>
Running the application:
import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.DOMImplementationList; import org.w3c.dom.ls.DOMImplementationLS; import org.w3c.dom.ls.LSParser; import org.w3c.dom.Document; import org.w3c.dom.Element; public class Level3Dom { public static void main (String args[]){ try { System.setProperty(DOMImplementationRegistry.PROPERTY, "org.apache.xerces.dom.DOMImplementationSourceImpl"); DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementationLS impl = (DOMImplementationLS)registry.getDOMImplementation("LS"); LSParser builder = impl.createLSParser( DOMImplementationLS.MODE_SYNCHRONOUS, null); Document document = builder.parseURI("candy.xml"); Element allofit = (Element)document.getDocumentElement(); System.out.println(allofit.getTextContent()); } catch (Exception e){ System.out.println(e.toString()); } } }
outputs:
Mints Chocolate Circus Peanuts
DOM Level 3 also provides two new properties for a Text
node.
The first is
wholeText
, which consists of not only the text in that node, but
any surrounding text nodes. In other words the ones that look like a single block to a human.
This corresponds to the new method, replaceWholeText()
, which lets you,
well, replace the whole text.
The second new property is isElementContentWhitespace
, which tells you whether the content
of the Text
node is what the spec says is "often abusively called
'ignorable whitespace'". (Note that you must validate the Document
,
either on loading or by normalizing, before the latter is available.)
New error management
DOM Level 3 has added the DOMErrorHandler
to the mix, enabling
you to react to events. You can check out Load and Save
for an example of how to use it. New additions also include the new DOMError
(which includes severity
, message
, type
,
relatedException
, relatedData
, and location
properties), and DOMLocator
(with lineNumber
, columnNumber
,
byteOffset
, utf16Offset
, relatedNode
, uri
and lineNumber
properties).
Attaching non-XML data
DOM Level 3 also includes the ability to add "user data" through the
Node.setUserData(key, data, handler)
and Node.getUserData(key)
methods and UserDataHandler
interface.