17.3 -"Hello Apache"
In this section, you gain a little momentum, and a success or two, before digging into the nitty-gritty of Xerces. You use basic DOM methods to create a simple XML document, and then you reverse the process and parse an existing XML document.
17.3.1 Your First Parser
Listing 17.1 is a complete program that creates a few XML elements and then displays the serialized XML. The name of the file is HelloApache.java.
LISTING 17.1 HelloApache Example
import java.io.StringWriter; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.apache.xerces.dom.DocumentImpl; import org.apache.xerces.dom.DOMImplementationImpl; import org.apache.xml.serialize.OutputFormat; import org.apache.xml.serialize.Serializer; import org.apache.xml.serialize.XMLSerializer; public class HelloApache { public static void main (String[] args) { try { Document doc = new DocumentImpl(); // Create Root Element Element root = doc.createElement("BOOK"); // Create 2nd level Element and attach to the Root Element Element item = doc.createElement("AUTHOR"); item.appendChild(doc.createTextNode("Bachelard.Gaston")); root.appendChild(item); // Create one more Element item = doc.createElement("TITLE"); item.appendChild(doc.createTextNode ("The Poetics of Reverie")); root.appendChild(item); item = doc.createElement("TRANSLATOR"); item.appendChild(doc.createTextNode("Daniel Russell")); root.appendChild(item); // Add the Root Element to Document doc.appendChild(root); //Serialize DOM OutputFormat format = new OutputFormat (doc); // as a String StringWriter stringOut = new StringWriter (); XMLSerializer serial = new XMLSerializer (stringOut, format); serial.serialize(doc); // Display the XML System.out.println(stringOut.toString()); } catch (Exception e) { e.printStackTrace(); } } }
Naturally, you could shorten the number of import statements, but type these out at first to get comfortable with the locations of important classes.
To compile and execute this example, type these statements at the command line:
javac classpath xercesImple.jar;xmlParserAPIs.jar;. HelloApache.java java cp xercesImple.jar;xmlParserAPIs.jar;.HelloApache
Hopefully, your results match the screen shown in Figure 17.5.
FIGURE 17.5 Output of HelloApache example.
Those familiar with other XML APIs recognize the Document and Element interfaces (mini-API references for both can be found in Section 17.5). Both interfaces permit calls of appendChild to attach DOM elements, and the Document interface is most often used to create new DOM elements (that is, createElement, createTextNode).
But first you must create the DOM Document. The easiest way to do this is by following the example code:
Document doc = new org.apache.xerces.dom.DocumentImpl();
This might be considered cheating by purists because, theoretically, specific implementation classes should not be created directly. Instead, they would prefer indirect creation via factory classes and interfaces. Xerces2 provides these as well:
javax.xml.parsers.DocumentBuilderFactory dbf = javax.xml.parsers.DocumentBuilderFactory.newInstance(); javax.xml.parsers.DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.newDocument();
Beware: If you have installed more than one XML parser implementation, you are not guaranteed to receive a Xerces2 Document in the javax.xml.parsers.DocumentBuilder's newDocument call. If you can be certain that the Xerces parser is the only XML parser accessible by your virtual machine, all should go well. But if you must have multiple XML parsers installed, use the Xerces-specific Document instantiator call. (There are also other workarounds
Also, notice the four lines of I/O calls that create the XML output string. You can find the XMLSerializer class in the org.apache.xml.serialize package (this, along with other utility classes, is introduced in Tables 17.3 through 17.10), which also includes HTML, XHTML, and simple text serializers. These serializer classes can output to either java.io.OutputStreams or java.io.Writers (designated by the first parameter to the *Serializer constructor). The second parameter of the XMLSerializer constructor is an org.apache.xml.serialize.OutputFormat object, which typically takes an entire Document object (this is a very interesting classyou can take great control of the style and content of the output). See the API documentation for the classes referenced here to see your range of I/O options.
17.3.2 Parsing "Hello Apache"
So now that you have seen how to build a simple XML document in Xerces2, take a brief look at how to load and parse an existing XML document. Listing 17.2 is a small program that takes an XML document's file name as a command-line argument and outputs the file's contents.
LISTING 17.2 HelloApache2 Example
import java.io.StringWriter; import java.io.IOException; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.FactoryConfigurationError; import javax.xml.parsers.ParserConfigurationException; import org.w3c.dom.Document; import org.xml.sax.SAXException; import org.apache.xml.serialize.OutputFormat; import org.apache.xml.serialize.Serializer; import org.apache.xml.serialize.DOMSerializer; import org.apache.xml.serialize.SerializerFactory; import org.apache.xml.serialize.XMLSerializer; public class HelloApache2 { public static void main (String[] args) { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(args[0]); OutputFormat format = new OutputFormat (doc); StringWriter stringOut = new StringWriter (); XMLSerializer serial = new XMLSerializer (stringOut, format); serial.serialize(doc); System.out.println(stringOut.toString()); } catch (FactoryConfigurationError e) { System.out.println ("Unable to get a document builder factory: " + e); } catch (ParserConfigurationException e) { System.out.println("Parser was unable to be configured: " + e); } catch (SAXException e) { System.out.println("Parsing error: " + e); } catch (IOException e) { System.out.println("I/O error: " + e); } } }
Compile HelloApache2.java exactly as before, with the explicit classpath specified (or set as you preferjust do not forget that there are two Xerces JAR files that must be made available to the compiler).
If you run your address.xml file through HelloApache2, remembering that address.xsd must be in the same path, you get the somewhat cluttered output shown in Figure 17.6.
FIGURE 17.6 Output of HelloApache2 example.
You can tell that all your content is present, and if you save this output to a different file, Web browsers or other XML interpreters can read this output. However, if you want a cleaner output, you can use the IndentPrinter class from the org.apache.xml.serialize package (and a full example is in the included sample class xerces-2_0_0\samples\xni\PSVIWriter.java) for some helpful hints. As of Xerces2 version 2.0.0, the only way to output XML cleanly with line breaks and tabs is to write your own serializer; several of the samples do this.
The code specifically responsible for the parsing is found in these three lines:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(args[0]);
This should look familiar, because it uses the DocumentBuilder and Factory classes. But whereas you used the Document object to create XML elements, you now receive a whole XML document, using the parse function in DocumentBuilder. DocumentBuilder's parse method can receive either a java.io.File or an org.xml.sax.InputSource object (the latter can itself accept java.io.InputStreams or java.io.Readers).
In addition, take note of the four types of exceptions that can be thrown. The first two in Listing 17.2, javax.xml.parsers.FactoryConfigurationError and javax.xml.parsers. ParserConfigurationException, alert you to major setup or configuration problems; you can then alert the system administrator to verify that only the correct Xerces JAR and class files are fed into the virtual machine. These exceptions might only need to be caught in the initialization stages. After the first successful parse, you probably only need to worry about the other, more common parsing exceptions: org.xml.sax.SAXException and java.io.IOException (which you should catch in every instance of XML document construction and parsing).
The DOM parser encapsulates an inner SAX parser, so be prepared to catch SAX exceptions even in pure DOM applications.