XML Reference Guide

Mar 14, 2003

␡

⎙ Print

< Back Page 31 of 278 Next >

The Document Object Model Level 3 Core Recommendation provides capabilities that enable you to more easily work with the data contained in an XML Document. These new capabilites fall into several categories, which I'll get into in a moment. They are:

A standard way to create a Document object from scratch
Additional information about documents and individual nodes
The ability to re-process (or re-validate) a document while it's in use
Additional node manipulation capabilities
Easier text manipulation
Enhanced error management
The ability to attach non-XML data to a Node

So let's take a look at some of these areas.

Bootstrapping

One of the big aggravations in earlier versions of the Document Object Model was the fact that it didn't specify how to actually create a Document object. Sure, you could directly create a Document from a DOMImplementation object, but there wasn't a standard way to create a DOMImplementation, so that didn't help very much. Instad, you had to check the documentation for your parser to get the implementation-specific way to do it. DOM Level 3 solves the problem by creating the DOMImplementationRegistry. The registry includes each of the available DOMImplementation classes, each of which has its own set of capabilities. For example, you may need an implementation that can handle DOM Level 3 Load and Save.

To do that, you can request a list of appropriate implementations from the registry:

import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.DOMImplementationList;
import org.w3c.dom.DOMImplementation;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.Document;

public class Level3Dom {

   public static void main (String args[]){

      try {
        System.setProperty(DOMImplementationRegistry.PROPERTY,
             "org.apache.xerces.dom.DOMImplementationSourceImpl");
        DOMImplementationRegistry registry = 
             DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = 
            (DOMImplementationLS)registry.getDOMImplementation("LS");

        DOMImplementation oldimpl = (DOMImplementation)impl;
        Document testdoc = oldimpl.createDocument("", "candy", null);

      } catch (Exception e){
        System.out.println(e.toString());
      }
   }
}

First, let the system know where to find the class that represents the overall implementation. From there, create an instance of the registry and request an implementation that includes the appropriate features, in this case the LS (or Load and Save) feature. (You can also request a list of all appropriate implementations and loop through them, if you like.)

In the Guide entry on Load and Save I showed you how to use the implementation to create a Level 3-style parser. Here I'm casting back to a regular DOMImplemenation to create a Document with no namespace information and a root element called candy.

Once you've actually created the Document, the DOM Level 3 provides a wealth of new information about it, including the encoding information and URI. Level 3 also added new information for attributes, such as the ability to specify an attribute as an identifier (so you can request its element using getElementById()), and additional namespace capabilities.

Re-processing a document

In previous versions of DOM, it was clear how to validate a document when you were loading it, but once you did that, you could do anything you wanted to the Document object, whether it was permitted by whatever schema (not the little "s") you were using. DOM Level 3 let's you "normalize" the Document after it's been created. This process does a number of things, from removing empty text nodes and combining adjacent ones to adjusting white space. You can also, however, use the DOMConfig attribute of the Document to control other aspects of this processing. For example, you can specify that comments should be removed, or that the Document should be revalidated:

import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.DOMImplementationList;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSParser;
import org.w3c.dom.*;
import org.w3c.dom.DOMConfiguration;

public class Level3Dom {

   public static void main (String args[]){

      try {
        System.setProperty(DOMImplementationRegistry.PROPERTY,
             "org.apache.xerces.dom.DOMImplementationSourceImpl");
        DOMImplementationRegistry registry = 
             DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = 
            (DOMImplementationLS)registry.getDOMImplementation("LS");

        LSParser builder = impl.createLSParser(
                         DOMImplementationLS.MODE_SYNCHRONOUS, null);

        DOMConfiguration config = builder.getDomConfig();
        config.setParameter("validate", Boolean.TRUE);
        config.setParameter("schema-type", "http://www.w3.org/2001/XMLSchema");
        config.setParameter("schema-location",
                         "order.xsd");

        Document document = builder.parseURI("order.xml");

        Node root = document.getDocumentElement();
        root.appendChild(document.createTextNode("bogus data"));
        System.out.println("Bogus data added.");

        DOMConfiguration docConfig = document.getDomConfig();
        docConfig.setParameter("validate", Boolean.TRUE);
        docConfig.setParameter("schema-type", "http://www.w3.org/2001/XMLSchema");
        docConfig.setParameter("schema-location", "order.xsd");
        docConfig.setParameter("comments", Boolean.FALSE);
        document.normalizeDocument();
 
      } catch (Exception e){
        System.out.println(e.toString());
      }
   }
}

In this case, I'm parsing the file -- see Load and Save for more information -- with validation turned on. After loading the file, I've added bogus data that will make the document invalid. I can then set validation for the Document itself (as opposed to the parser) and when I normalize the document, I'll get an error, as you can see in this output:

Bogus data added.
[Error] :-1:-1: cvc-complex-type.2.3: Element 'order' cannot have 
character [children], because the type's content type is element-only.

Other attributes you can control include the ability to check for well-formedness against a particular version of XML (ie, 1.0 vs. 1.1), the ability to request the canonical form of the document, and namespace and entity information.

Additional Document manipulation capabilities

Ever try to move a Node from one Document to another? The moment you try to append it to the appropriate parent element, you'll get a "wrong document" error. Finally, DOM Level 3 makes it not only possible, but easy, with adoptNode:

import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.DOMImplementationList;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSParser;
import org.w3c.dom.*;
import org.w3c.dom.DOMConfiguration;

public class Level3Dom {

   public static void main (String args[]){

      try {
        System.setProperty(DOMImplementationRegistry.PROPERTY,
             "org.apache.xerces.dom.DOMImplementationSourceImpl");
        DOMImplementationRegistry registry = 
             DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = 
            (DOMImplementationLS)registry.getDOMImplementation("LS");

        DOMImplementation oldimp = (DOMImplementation)impl;
        Document testDoc = oldimp.createDocument("", "baseelement", null);
        Node testroot = testDoc.getDocumentElement();
        testroot.appendChild(testDoc.createElement("theChild"));

        LSParser builder = impl.createLSParser(
                         DOMImplementationLS.MODE_SYNCHRONOUS, null);

        Document document = builder.parseURI("order.xml");

        document.adoptNode(testroot);
        root.appendChild(testroot);

      } catch (Exception e){
        System.out.println(e.toString());
      }
   }
}

Note that the adoptNode() method doesn't actually add the Node to the tree, but rather puts it into a kind of limbo from which you can pluck it and add it to the tree. (Note also that if you adopt a Node that's already part of the Document in question, you'll be removing it from that tree. It's still part of the Document in that you can add it back in, however.)

DOM Level 3 also provides the ability to compare the position of nodes with Node.compareDocumentPosition(other) as well as the ability to determine whether two nodes are actually the same node or just equivalent, with Node.isSameNode(other) and Node.isEqualNode(arg).

Improved text handling

DOM Level 3 also takes some of the aggravation out of dealing with text. As you may know, just because the text version of an XML document shows a "block" of text, say, as the child of an element, doesn't mean that it's a single text node. It could, in fact, be multiple text nodes adjacent to each other. (Note that you can fix this problem with the Document.normalizeDocument() and Node.normalize() methods, but that's not the point here.) You might also want all of the text in a node, even if it's actually content of one or more child elements. You can accomplish this task with the textContent property of a Node. For example, consider this XML document:

<?xml version="1.0"?>

<candy>
  <product>Mints</product>
  <product>Chocolate</product>
  <product>Circus Peanuts</product>

</candy>

Running the application:

import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.DOMImplementationList;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSParser;
import org.w3c.dom.Document;
import org.w3c.dom.Element;

public class Level3Dom {

   public static void main (String args[]){

      try {
        System.setProperty(DOMImplementationRegistry.PROPERTY,
             "org.apache.xerces.dom.DOMImplementationSourceImpl");
        DOMImplementationRegistry registry = 
             DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = 
            (DOMImplementationLS)registry.getDOMImplementation("LS");

        LSParser builder = impl.createLSParser(
                         DOMImplementationLS.MODE_SYNCHRONOUS, null);

        Document document = builder.parseURI("candy.xml");
        Element allofit = (Element)document.getDocumentElement();
        System.out.println(allofit.getTextContent());

      } catch (Exception e){
        System.out.println(e.toString());
      }
   }
}

outputs:

  Mints
  Chocolate
  Circus Peanuts

DOM Level 3 also provides two new properties for a Text node. The first is wholeText, which consists of not only the text in that node, but any surrounding text nodes. In other words the ones that look like a single block to a human. This corresponds to the new method, replaceWholeText(), which lets you, well, replace the whole text.

The second new property is isElementContentWhitespace, which tells you whether the content of the Text node is what the spec says is "often abusively called 'ignorable whitespace'". (Note that you must validate the Document, either on loading or by normalizing, before the latter is available.)

New error management

DOM Level 3 has added the DOMErrorHandler to the mix, enabling you to react to events. You can check out Load and Save for an example of how to use it. New additions also include the new DOMError (which includes severity, message, type, relatedException, relatedData, and location properties), and DOMLocator (with lineNumber, columnNumber, byteOffset, utf16Offset, relatedNode, uri and lineNumber properties).

Attaching non-XML data

DOM Level 3 also includes the ability to add "user data" through the Node.setUserData(key, data, handler) and Node.getUserData(key) methods and UserDataHandler interface.

< Back Page 31 of 278 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address