Implementing Simple APIs for XML
- The Packages in JAXP for Using SAX
- The Key JAXP Classes and Interfaces for SAX Support
- The JAXP and Reference Implementation JAR Files
- Creating a SAX-Parsing Application
- Summary
In This Chapter
-
The Packages in JAXP for Using SAX
-
The Key JAXP Classes and Interfaces for SAX Support
-
The JAXP and Reference Implementation JAR Files
-
Creating a SAX-Parsing Application
An XML document is essentially a text document that is meaningful only when an application can process it and get the desired data. There are three standards by which an XML document can be processed: Simple APIs for XML (SAX), Document Object Model (DOM), and XSL Transformations (XSLT).
SAX is a public domain API that uses an event-driven mechanism to parse XML documents. This implies that the parser reads an XML document, and each time it recognizes a valid XML syntax (such as <, </, and so on), it notifies the application running it. The notification is done by means of callback methods.
For example, assume an XML document has the following element:
<bookname>JAX APIs</bookname>
When the parser reaches the < symbol, it calls a method named startElement(). When the text is reached, it calls the characters() method, and for </, the parser will call the endElement() method. These and other methods are available through a number of interfaces that you implement in your application.
The DOM APIs enable an application to parse through the XML document and create an in-memory representation of the XML document. This makes it possible for applications to access and modify the data. Because it loads the entire XML document in memory, it is very processor-intensive, and therefore might have performance issues with large documents.
SAX, on the other hand, processes documents serially, and throws callbacks whenever an XML component is reached. It does not load any information in memory. This makes the processing of XML faster compared to DOM APIs. However, because it does not create an in-memory representation of XML data, it is not possible to manipulate data using SAX.
XSLT is a language description, which provides a mechanism by which an application can transform an XML document into another XML document, or other outputs such as HTML.
There are a large number of parsers, such as Xerces and Crimson, that support the SAX, DOM, and XSLT standards. However, you need to know the implementation details of these parsers to use them. Also, if you decide to change the parser implementation in your application, you need to change and recompile your code.
Java APIs for XML (JAXP) endorses the SAX, DOM, and XSLT standards, and provides a standard and consistent set of APIs with which you can process and transform XML documents. These APIs are independent of a vendor-specific parser implementation. Therefore, in your application you can use the same API calls, irrespective of whether you use Xerces, Crimson, or any other parser that follows the DOM, SAX and XSLT specifications. This is made possible by the pluggability layer that enables you to choose and use a specific parser implementation at runtime. The pluggability layer also ensures that you do not need to change the application code even if you change the parser implementation.
This chapter explains how JAXP enables you to process an XML document using a SAX-based parser. This chapter is divided into two parts. The first part explains the JAXP packages that contain the SAX packages and describes the APIs that are used by an application to process an XML document. The second part takes you through a step-by-step process to implement JAXP in your application to process an XML document using a SAX parser.
The Packages in JAXP for Using SAX
Three SAX packages contain the APIs that are required to use SAX in an application. These APIs are developed and maintained by the XML-DEV group. JAXP endorses the SAX 2.0 standard by including the SAX packages by reference. JAXP also includes a package that defines the core Java APIs for XML processing. This section discusses the SAX and JAXP packages in detail.
SAX 2.0 and SAX Extension are defined in the following packages.
The org.xml.sax Package
This package contains the basic SAX interfaces. The interfaces available through this package are listed in Table 3.1.
Table 3.1 The org.xml.sax Interfaces
Interface Name |
Description |
Attributes |
This interface declares the methods with which an application can access the list of attributes in an XML document. |
ContentHandler |
This interface declares the methods that an application should implement to handle the content event callbacks, such as the starting of an element, that are generated by the SAX parser. This interface is described in detail later in the chapter. |
DTDHandler |
This interface declares the methods that inform the application about notations and unparsed entities. This interface is described in detail later in the chapter. |
EntityResolver |
This interface declares a method that enables the application to process the external entities. This interface is described in detail later in the chapter. |
ErrorHandler |
This interface declares the methods that enable you to implement XML parsing-related error handling in your application. |
Locator |
This interface declares two types of methods: one for identifying the location in the document (in terms of line and column number) where a content event occurs, and the other for identifying the system and public identifiers of the XML document. The content event locating methods are often used for locating errors when XML document parsing fails because of erroneous XML data. |
XMLFilter |
This interface declares the methods for reading events from another XML reader. |
XMLReader |
This interface declares the methods that are required for reading an XML document. This is the most crucial interface, as it defines the various methods for parsing the XML document. This interface also defines the methods to get and set the instances of the classes that implement the content, DTD, and error event handlers and resolve entities. |
The org.xml.sax.ext Package
This package is an extension on the SAX 2.0 API package. This package provides the interfaces with which you can access the lexical and DTD declaration information of an XML document. These interfaces are optional, and might not be supported by all parsers.
The interfaces available through this package are listed in Table 3.2.
Table 3.2 The org.xml.sax.ext Interfaces
Interface Name |
Description |
DeclHandler |
This interface declares the methods that inform the application about the DTD declarations in an XML document. |
LexicalHandler |
This interface declares the methods that inform the application about lexical events such as comments, CDATA tags, and references to parsed entities. |
The org.xml.sax.helpers Package
This package contains helper classes that have default implementations of the interfaces defined in the org.xml.sax package.
The classes contained in the org.xml.sax.helpers package are listed in Table 3.3.
Table 3.3 The org.xml.sax.helper Classes
Class Name |
Description |
AttributesImpl |
This class implements the Attributes interface. |
DefaultHandler |
This class implements the event handling interfaces of the org.xml.sax package. Specifically, the DefaultHandler class implements the ContentHandler, DTDHandler, EntityResolver, and ErrorHandler interfaces. |
LocatorImpl |
This class implements the Locator interface. |
NamespaceSupport |
This class encapsulates the logic of namespace processing for an XML document. |
ParserAdapter |
This class implements the DocumentHandler and the XMLReader interface. This class wraps a SAX 1.0 parser and adapts it as a SAX 2.0 XML reader with feature, property, and namespace support. See the following sidebar "SAX 1.0 Versus SAX 2.0" to understand the difference between SAX 1.0 and SAX 2.0 and the need for this class. |
XMLFilterImpl |
This class is used to derive an XML filter. |
XMLReaderAdapter |
This class does the reverse of what the ParserAdapter class does; it adapts a SAX 2.0 XMLReader as a SAX 1.0 parser. See the following sidebar "SAX 1.0 Versus SAX 2.0" to understand the difference between SAX 1.0 and SAX 2.0 and the need for this class. |
XMLReaderFactory |
This is a factory class for creating an XML reader. |
SAX 1.0 Versus SAX 2.0
SAX 1.0 was released in May 1998. It contains eleven core classes and interfaces, along with three optional helper classes and five demonstration classes. These interfaces and classes are divided into two packages: org.xml.sax and org.xml.sax.helpers.
In SAX 1.0, the Parser interface is the most critical interface that an XML parser has to implement. This interface provides the different parse() methods and methods to register handlers for content event, DTD event, error event, and entity resolution callbacks.
For an application using a SAX 1.0 parser, the most critical interface to implement is the DocumentHandler interface. This interface provides the necessary methods to handle the document event callbacks generated by the parser.
SAX 2.0 was released in January 2002 with several additional features, such as complete support for namespaces, as well as the capability to generate callbacks for skipped entities and look up an attribute's index by name.
The SAX 2.0 packaging structure is also a little different from the SAX 1.0 packaging structure. A few classes and interfaces have been deprecated and replaced with other interfaces and classes. The most important of these changes deals with the Parser and DocumentHandler interfaces, which are replaced with the XMLReader and ContentHandler interfaces, respectively.
However, to ensure that both SAX 1.0 and SAX 2.0 applications can work with each other, SAX 2.0 provides a ParserAdaptor and an XMLReaderAdaptor class. The ParserAdaptor class makes a SAX 1.0 Parser act as a SAX 2.0 XMLReader, and includes the feature, property, and namespace support.
Similarly, the XMLReaderAdaptor class makes a SAX 2.0 XMLReader act as a SAX 1.0 Parser.
More information about SAX 1.0, SAX 2.0, and their differences is available from the official Web site for SAX at http://www.saxproject.org.
The javax.xml.parsers Package
This package contains the classes that enable an application to process the XML documents using either SAX or DOM parsers.
Table 3.4 lists the classes in the package that are used for processing an XML document using SAX.
Table 3.4 The javax.xml.parsers Classes for SAX
Classes |
Description |
SAXParser |
This abstract class implements the XMLReader interface. Therefore, an application can use an instance of this class for parsing XML. This class is explained in detail later in the chapter. |
SAXParserFactory |
This abstract class defines the factory API that enables applications to get a SAX-based parser to parse XML documents. |
Now that we've looked at the JAXP packages that provide support for using SAX-based parsers to parse XML documents, let's look at the key classes and interfaces that you would need to use more regularly than the others.