Java and the Apache XML Project
This chapter is a tour through the emerging world of Apache, specifically the Xerces Java XML parser. The chapter introduces the Xerces download component, its integrated parser, documentation, and samples. Then it focuses on the critical packages and shows how to construct working applications, using both the Document Object Model (DOM) and Simple API for XML (SAX) models. You may use these samples as frameworks for further development. Along the way, the chapter introduces every important class and interface, so that by the end of the chapter, you will be adept in the construction of XML applications.
We assume that you have at least an intermediate comfort level with Java, that you understand the concepts of paths and classpaths, that you have utilized Java packages, classes, and interfaces, and that you have experience writing, compiling, and running applications. If you meet these requirements, and are comfortable with previous chapters, then hop on board.
17.1 Apache Background
Apache is a story that warms the hearts of Internet traditionalists. Sometimes confused with IBM (thanks to the influential corporation's mass acceptance of its software), Apache is actually a pure not-for-profit, open-source endeavor. Formed in 1995 by a half dozen Webmasters to consciously develop "a cog for the Internet," Apache emerged as the most widely accepted HTTP serverpossibly the most successful piece of shareware ever released in terms of market share. Their triumph has ensured that at least one standard, the HTTP protocol, remains simple and approachable, safeguarded from proprietary interests.
The Apache Software Foundation (at http://www.apache.org) now boasts 60+ members whose open-source vision has embraced emerging standards to provide practical, zero-cost implementations for technologies ranging from Perl to PHP to XML. This chapter, of course, focuses on the XML technologies (and trust us, all the others are just as fun as this one!).
The Apache project features the Xerces XML parsers (available in Java and C++) but also hosts a broad realm of XML technologies. Developers can access additional tools that assist Web publishing, SOAP development, and formatting. The following is a brief list of XML sub-projects, taken from the Apache XML Web site (http://xml.apache.org).
Xerces: XML parsers in Java, C++ (with Perl and COM bindings)
Xang: Rapid development of dynamic server pages, in JavaScript
Xalan: XSLT stylesheet processors, in Java and C++
SOAP: Simple Object Access Protocol
FOP: XSL formatting objects, in Java
Crimson: Java XML parser derived from the Sun Project X Parser
Cocoon: XML-based Web publishing, in Java
Batik: Java-based toolkit for Scalable Vector Graphics (SVG)
AxKit: XML-based Web publishing, in mod_perl
Many of these projects support recent additions to the XML set of standards. The Apache-Xerces parser, for instance, has provided XML Schema functionality since early in its inception; Xerces version 1.1 (released in May 2000) supported the working draft specification and has been updated regularly. Xerces has been fully XML Schema-compliant since Xerces version 1.1.3 (save for minor limitations, which are well documented at http://xml.apache.org/xerces-j/releases.html).
Note that we have referred to a singular parser, but a visit to http://xml.apache.org reveals links to two different parsers: Xerces Java 1 and Xerces Java 2. Xerces Java 2, or simply Xerces2, is much more recent, a complete rewrite of the existing version 1 codebase. Xerces2 has a custom Xerces Native Interface (XNI), and its source code is said to be "much cleaner, more modular, and easier to maintain" than Xerces1. Xerces2 also implements the latest W3 XML Schema standards. Table 17.1 contains a matrix of implemented standards for both parsers.
TABLE 17.1 A Comparison of Xerces Parsers
Supported Standards |
Xerces Java 1 |
Xerces Java 2 |
Current Version (8/2002) |
1.4.4 |
2.0.2 |
XML Recommendation |
1.0 Recommendation |
1.0, Second Edition |
XML Namespaces |
Recommendation |
Recommendation |
Document Object Model |
DOM Level 1 and 2 |
DOM Level 2-Core, Events, Traversal, and Range Recommendations |
DOM Level 3-Core, Abstract Schemas, Load, and Save Working Drafts |
||
Simple API for XML (SAX) |
SAX Level 1 and 2 |
SAX Level 2 Core, Extension |
Java APIs for XML Processing (JAXP) |
JAXP 1.1 |
JAXP 1.1 |
XML Schema |
1.0 |
-1.0, Structures and Datatypes Recommendation, DOM Level 3 revalidation |
Because the features are nearly parallel, your choice between the two parsers rests primarily on your desire for customization. Will you need access to code for adjustment or extension (possibly to implement late W3 features yourself)? Xerces2 might be your best choice; but extend your test schedule appropriately because Xerces2 might be a bit less stable and reliable (and check back to the http://xml.apache.org Web site often for updates). Xerces2 now receives the majority of attention from Apache developers. For purposes of this chapter, we use Xerces2. When we refer to Xerces or "the parser," understand that we explicitly mean Xerces2.