- The Packages in JAXP for Using SAX
- The Key JAXP Classes and Interfaces for SAX Support
- The JAXP and Reference Implementation JAR Files
- Creating a SAX-Parsing Application
- Summary
The Key JAXP Classes and Interfaces for SAX Support
The classes and interfaces that are critical for an application to parse an XML document using a SAX parser are discussed in the following sections.
The SAXParserFactory Class
SAXParserFactory is an abstract class that defines the factory API that enables an application to get an instance of the SAXParser class. The instance of the SAXParser class provides the SAX-based parser to parse XML documents.
An instance of the SAXParserFactory class can be obtained by using the static newInstance() method of the SAXParserFactory class. The actual SAX parser that is loaded depends on the entry of the javax.xml.parsers.SAXParserFactory system property. The default implementation associated with the Java Web Services Developer pack is the Xerces parser. The system property can be changed with the -D option. For example, the following command will ensure that the MyXMLHandler application uses Xerces as the SAX parser:
java -Djavax.xml.parsers.SAXParserFactory= org.apache.xerces.jaxp.SAXParserFactoryImpl MyXMLHandler.
The SAXParserFactory class defines several methods that you can use to query and configure the parser. For example, you can use the setNamespaceAware() method to set the SAX parser to be namespace-aware. Setting the parser to be namespace-aware enables it to provide support for namespaces in an XML document. See the section "Handling Namespaces" in Chapter 4, "Advanced Use of SAX," to understand namespaces and how they are supported and processed by XML parsers.
The SAXParser Class
SAXParser is an abstract class that implements the XMLReader interface. The XMLReader interface provides the necessary methods to parse an XML document. Therefore, by obtaining an instance of the SAXParser class, you gain access to several parse() methods with which an XML document can be parsed from a variety of input sources, such as files, URLs, inputstreams, and so on.
Table 3.5 describes the methods that are defined in the XMLReader interface and are critical for a SAX parser. You will see many of these methods used in the examples later in this chapter, as well as in Chapter 4.
Table 3.5 XMLReader Methods
XMLReader Methods |
Description |
getContentHandler() |
This method returns the content event handler registered with the parser. |
getDTDHandler() |
This method returns the DTD event handler registered with the parser. |
getEntityResolver() |
This method returns the entity resolver registered with the parser. |
getErrorHandler() |
This method returns the error handler registered with the parser. |
getFeature(java.lang.String name) |
This method looks up the value of the feature provided in the argument. The returned value is either true or false. The argument has to be a fully qualified URI. The list of standard features that a SAX parser can support is available at http://www.saxproject.org/apidoc/org/xml/sax/package-summary.html#package_description. |
getProperty(java.lang.String name) |
This method looks up the value of the property specified in the argument. The returned value is either true or false. The argument has to be a fully qualified URI. The list of standard features that a SAX parser can support is available at http://www.saxproject.org/apidoc/org/xml/sax/package-summary.html#package_description. |
parse(InputSource input) |
This method reads the XML information from any valid input stream and parses it. |
parse(java.lang.String systemId) |
This method reads the XML document from a system identifier and parses it. |
setContentHandler(ContentHandler handler) |
This method registers a content event handler with the parser. |
setDTDHandler(DTDHandler handler) |
This method registers a DTD event handler with the parser. |
setEntityResolver(EntityResolver resolver) |
This method registers an entity resolver with the parser. |
setErrorHandler(ErrorHandler handler) |
This method registers an error event handler with the parser. |
setFeature(java.lang.String name, boolean value) |
This method sets the state of a feature for the parser. For example, the value true for the feature http://xml.org/sax/features/external-parameter-entities will ensure that the parser will process external parameter entities. |
setProperty(java.lang.String name, java.lang.Object value) |
This method sets the state of a property for the parser. For example, setting the http://xml.org/sax/properties/lexical-handler will ensure that lexical handling is turned on for the parser. (Of course, you will need to implement the LexicalHandler interface as well). |
An instance of the SAXParser class can be obtained by using the newSAXParser() method of the SAXParserFactory class.
CAUTION
It is important to note that the SAXParserFactory class is not thread-safe. However, if the instance of the class is used by only one thread, an application can use the same instance of the SAXParserFactory to obtain multiple instances of SAXParser. As with the SAXParserFactory class, the SAXParser class is also not thread-safe.
The DefaultHandler Class
DefaultHandler is an adapter class that implements the four core SAX 2.0 handler interfaces and provides do-nothing implementations of the interface methods. These four interfaces are as follows:
ContentHandler
ErrorHandler
DTDHandler
EntityResolver
You can extend this class and override only those callback methods that are required. This is a convenient alternative to implementing the interface methods as do-nothing methods when implemented individually in an application.
The ContentHandler Interface
The ContentHandler interface defines the methods that an application must implement to handle the events that are generated when a valid XML syntax is found. For example, when the parser finds a < in the XML source, it invokes the startElement() method of the ContentHandler interface. This is probably the most critical interface, and almost all applications will at least need to implement the methods defined in the ContentHandler interface.
The ErrorHandler Interface
This interface declares the methods that enable you to implement error handling in your application. You need to implement this interface to implement your own customized error handling. This interface provides the error(), fatalerror(), and warning() methods to handle the different possible error conditions. Note that you should register the instance of the ErrorHandler interface with the XML reader. Otherwise, the errors generated during the parsing of the XML document will go unreported, and you might get some strange output. You will learn how to register and use the ErrorHandler methods later in the "Handling Errors" section in this chapter.
The DTDHandler Interface
This interface declares the methods that inform the application about notations and unparsed entities. Notations are used to represent non-parseable (binary) data, such as an image or a multimedia file, as entities in an XML document. This mechanism to represent binary data as an entity is a two-step process.
First, you need to provide an entity declaration for the binary data with the NDATA keyword. For example, to represent a movie file, the entity declaration could be like this:
<!ENTITY aMovieFile SYSTEM "http://xxxx.xxxx.xxxx/xxxx" NDATA avi>
The NDATA keyword specifies that data in the entity is not parseable and uses a different notation called avi.
Next, the DTD has to include a declaration for the notation. The DTD entry for the previous notation will be as follows:
<!NOTATION avi SYSTEM " http://xxxx.xxxx.xxxx/xxxx">
The DTDHandler interface declares two methods that are invoked when the parser reaches an unparsed entity and a notation. These methods pass on the unparsed entity and the notation information to the calling application.
The EntityResolver Interface
This interface declares the methods that enable the application to process the external entities. External entities are the entities that are defined outside the scope of the XML document being parsed, such as an external DTD, or an external parameter entity. Although by default the parser has the capability to process external entities automatically, this interface can be used to customize the handling of the external entity resolution. The customization of external entity resolution is covered in detail in Chapter 4.
This concludes the first part of this chapter. The next part will take you through a series of examples that will show you how SAX is used to process XML documents.
NOTE
To work through the examples, you'll first need to download the APIs. See Appendix A, "Installing the JAX APIs," to learn more about the process of downloading JAXP and the JAXP reference implementation provided by Sun. You will also need to download the Java Web Services Developer Pack (JWSDP). The JWSDP includes JAXP, as well as the Xalan and Xerces APIs. These are the default JAXP reference implementations for SAX, DOM, and XSLT.
After downloading the JWSDP, you need to set up the JAR files for JAXP and the JAXP reference implementation. After the JAR files are set up, you can begin coding the applications that use SAX.