Home > Articles > Web Services > XML

Sams Teach Yourself XML in 21 Days

Sep 16, 2005

📄 Contents

␡

⎙ Print

< Back Page 218 of 288 Next >

Recommended Book 

Sams Teach Yourself XML in 21 Days, 3rd Edition

Learn More Buy

Using SAX

Today's first example shows how to work with SAX. You'll start with the same example you used yesterday, except today, you'll use SAX. In particular, you're going to extract all the data from ch17_01.xml, which is shown in Listing 17.1.

Example 17.1. A Sample XML Document for Use with SAX Methods (`ch17_01.xml`)

<?xml version="1.0" encoding="UTF-8"?>
<session>
   <committee type="monetary">
       <title>Finance</title>
       <number>17</number>
       <subject>Donut Costs</subject>
       <date>7/15/2005</date>
       <attendees>
           <senator status="present">
               <firstName>Thomas</firstName>
               <lastName>Smith</lastName>
           </senator>
           <senator status="absent">
               <firstName>Frank</firstName>
               <lastName>McCoy</lastName>
           </senator>
           <senator status="present">
               <firstName>Jay</firstName>
               <lastName>Jones</lastName>
           </senator>
       </attendees>
   </committee>
</session>

How do you handle this XML document by using SAX? You start in the main method by calling a new version of the childLoop method created yesterday. This method will fill the same array of strings, displayText, and you'll store the number of strings in a variable named numberLines. When the childLoop method is done, all you have to do is display all the text in the displayText array:

import java.io.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
import org.xml.sax.helpers.DefaultHandler;

public class ch17_02 extends DefaultHandler
{
    static int numberLines = 0;

       static String indentation = "";

       static String displayText[] = new String[1000];


       public static void main(String args[])

       {

           ch17_02 parser = new ch17_02();

           parser.childLoop(args[0]);


           for(int loopIndex = 0; loopIndex < numberLines; loopIndex++){

               System.out.println(displayText[loopIndex]);

           }
    }

In the childLoop method, start by creating a SAXParserFactory object, using a DefaultHandler object. The DefaultHandler object tells SAX which object to call when it encounters various nodes, and you'll use the present application object, which you've based on the DefaultHandler class:

public class ch17_02 extends DefaultHandler
{
        .
        .
        .

To refer to the current object, use the Java this keyword. Here's how to create the SAXParserFactory object:

public void childLoop(String uri)
{
    DefaultHandler saxHandler = this;

       SAXParserFactory saxFactory = SAXParserFactory.newInstance();
        .
        .
        .
}

Now create a new SAX parser by using this SAXParserFactory object, and this SAX parser will parse the XML document ch17_01.xml:

public void childLoop(String uri)
{
    DefaultHandler saxHandler = this;
    SAXParserFactory saxFactory = SAXParserFactory.newInstance();
    try {

           SAXParser saxParser = saxFactory.newSAXParser();

           saxParser.parse(new File(uri), saxHandler);

       } catch (Throwable t) {}
}

Table 17.1 lists the significant methods of the SAXParserFactory class, and Table 17.2 lists the significant methods of the SAXParser class.

Table 17.1. Methods of the `javax.xml.parsers.SAXParserFactory` Interface

Method	What It Does
`protected SAXParserFactory()`	Acts as the default constructor for the class.
`boolean isNamespaceAware()`	Returns `True` if the factory is configured to produce parsers that use XML namespaces.
`boolean isValidating()`	Returns `True` if the factory is configured to produce parsers that validate the XML content.
`static SAXParserFactory newInstance()`	Returns a new `SAXParserFactory` object.
`abstract SAXParser newSAXParser()`	Returns a new `SAXParser` object.
`void setNamespaceAware(boolean awareness)`	Requires that the created parser support XML namespaces.
`void setValidating(boolean validating)`	Requires that the parser produced validate XML documents.

Table 17.2. Methods of the `SAXParser` Class

Method	What It Does
`protected SAXParser()`	Acts as the default class constructor.
`abstract Parser getParser()`	Returns the SAX parser.
`abstract boolean isNamespaceAware()`	Returns `True` if this parser can understand namespaces.
`abstract boolean isValidating()`	Returns `True` if this parser is configured to validate XML documents.
`void parse(File f, DefaultHandler dh)`	Parses the file specified.
`void parse(InputSource is, DefaultHandler dh)`	Parses the content specified `InputSource` object.
`void parse(InputStream is, DefaultHandler dh)`	Parses the content of the specified `InputStream` object.
`void parse(String uri, DefaultHandler dh)`	Parses the content at the given URI, using the specified `DefaultHandler` object.
`abstract void setProperty(String name, Object value)`	Sets a property in the parser.

Now you've connected our SAX parser to our program and launched it, which means it will be calling various methods in your code to handle various types of nodes. It does this because you've based the program's main class on the SAX DefaultHandler class:

import java.io.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
import org.xml.sax.helpers.DefaultHandler;

public class ch17_02 extends DefaultHandler
        .
        .
        .

The DefaultHandler class has a number of predefined methods, called callback methods, that the SAX parser will call:

characters — Called by the SAX parser for text nodes.
endDocument — Called by the SAX parser when the end of the document is seen.
endElement — Called by the SAX parser when the closing tag of an element is seen.
startDocument — Called by the SAX parser when the start of the document is seen.
startElement — Called by the SAX parser when the opening tag of an element is seen.

All the required callback methods are already implemented in the DefaultHandler class, but they don't do anything. That means we only have to implement the methods we want to use, such as startDocument to catch the beginning of the document or endDocument to catch the end of the document, as described later today. Table 7.3 lists the significant methods of the DefaultHandler class.

Table 17.3. Methods of the `DefaultHandler` Class

Method	What It Does
`DefaultHandler()`	Acts as the default class constructor.
`void characters(char[] ch, int start,` `int length)`	Handles text nodes.
`void endDocument()`	Handles the end of the document.
`void endElement(String uri, String` `localName, String qName)`	Handles the end of an element.
`void error(SAXParseException e)`	Handles a recoverable parser error.
`void fatalError(SAXParseException e)`	Reports a fatal parsing error.
`void ignorableWhitespace(char[] ch,` `int start, int length)`	Handles ignorable whitespace (such as that used to indent a document) in element content.
`void notationDecl(String name,` `String publicId, String systemId)`	Handles a notation declaration.
`void processingInstruction(String target,` `String data)`	Handles an XML processing instruction (such as a JSP directive).
`InputSource resolveEntity(String publicId,` `String systemId)`	Resolves an external entity.
`void setDocumentLocator(Locator locator)`	Sets a `Locator` object for document events.
`void skippedEntity(String name)`	Handles a skipped XML entity.
`void startDocument()`	Handles the beginning of the document.
`void startElement(String uri, String` `localName, String qName, Attributes attributes)`	Handles the start of an element.
`void startPrefixMapping(String prefix,` `String uri)`	Handles the start of a namespace mapping.
`void unparsedEntityDecl(String name,` `String publicId, String systemId, String notationName)`	Handles an unparsed entity declaration.
`void warning(SAXParseException e)`	Handles a parser warning.

Let's start by handling the start of the document.

Handling the Start of a Document

To handle the start of a document, you can implement the DefaultHandler startDocument method:

public void startDocument()
{
        .
        .
        .
}

When this method is called, the SAX processor has already seen the beginning of the document, so just put a generic XML declaration into the displayText array:

public void startDocument()
{
    displayText[numberLines] = indentation;

       displayText[numberLines] += "<?xml version=\"1.0\" encoding=\""+

           "UTF-8" + "\"?>";

       numberLines++;
}

Handling Processing Instructions

We can handle processing instructions by using the DefaultHandler processingInstruction method, which is called automatically when the SAX parser finds a processing instruction. The target of the processing instruction is passed to us, as is the data for the processing instruction, which means you can handle processing instructions like this:

public void processingInstruction(String target, String data)
{
    displayText[numberLines] = indentation;
    displayText[numberLines] += "<?";
    displayText[numberLines] += target;
    if (data != null && data.length() > 0) {
        displayText[numberLines] += ' ';
        displayText[numberLines] += data;
    }
    displayText[numberLines] += "?>";
    numberLines++;
}

Handling the Start of an Element

To handle the start of an element, use the startElement SAX method. This method is passed the namespace URI of the element, the local (unqualified) name of the element, the qualified name of the element, and the element's attributes (as an Attributes object):

public void startElement(String uri, String localName, String qualifiedName,
    Attributes attributes)
{
        .
        .
        .
}

Store the element's name in our displayText array, like this:

public void startElement(String uri, String localName, String qualifiedName,
    Attributes attributes)
{
    displayText[numberLines] = indentation;


       indentation += "    ";


       displayText[numberLines] += '<';

       displayText[numberLines] += qualifiedName;

           .

           .

           .

       displayText[numberLines] += '>';

       numberLines++;
}

So far, so good. But what if the element has attributes?

Handling Attributes

If the element has attributes, loop over them. And the way you determine whether the element has attributes is by checking whether the Attributes object passed to you in the startElement method is null:

public void startElement(String uri, String localName, String qualifiedName,
    Attributes attributes)
{
    displayText[numberLines] = indentation;

    indentation += "    ";

    displayText[numberLines] += '<';
    displayText[numberLines] += qualifiedName;
    if (attributes != null) {

           .

           .

           .

       }
    displayText[numberLines] += '>';
    numberLines++;
}

Table 17.4 lists the methods of Attributes objects. We can reach the attributes in an object that implements this interface based on index, name, or namespace-qualified name.

Table 17.4. `Attributes` Interface Methods

Method	What It Does
`int getIndex(java.lang.String uri,` `java.lang.String localPart)`	Returns the index of an attribute, by namespace and local name.
`int getIndex(java.lang.String qualifiedName)`	Returns the index of an attribute, given its qualified name.
`int getLength()`	Returns the number of attributes in the list.
`java.lang.String getLocalName(int index)`	Returns an attribute's local name, by index.
`java.lang.String getQName(int index)`	Returns an attribute's qualified name, by index.
`java.lang.String getType(int index)`	Returns an attribute's type, by index.
`java.lang.String getType-(java.lang.String qualifiedName)`	Returns an attribute's type, by qualified name.
`java.lang.String getType(java.lang.String` `uri, java.lang.String localName)`	Returns an attribute's type, by namespace and local name.
`java.lang.String getURI(int index)`	Returns an attribute's namespace URI, by index.
`java.lang.String getValue(int index)`	Returns an attribute's value, by index.
`java.lang.String getValue-(java.lang.String qualifiedName)`	Returns an attribute's value, by qualified name.
`java.lang.String getValue(java.lang.String` `uri, java.lang.String localName)`	Returns an attribute's value, by namespace name and local name.

Now loop over the attributes and use the getQName (get qualified name) and getValue methods to store the name and value of each attribute:

public void startElement(String uri, String localName, String qualifiedName,
    Attributes attributes)
{
    displayText[numberLines] = indentation;

    indentation += "    ";

    displayText[numberLines] += '<';
    displayText[numberLines] += qualifiedName;
    if (attributes != null) {
        int numberAttributes = attributes.getLength();

           for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++) {

               displayText[numberLines] += ' ';

               displayText[numberLines] += attributes.getQName(loopIndex);

               displayText[numberLines] += "=\"";

               displayText[numberLines] += attributes.getValue(loopIndex);

               displayText[numberLines] += '"';

           }
    }
    displayText[numberLines] += '>';
    numberLines++;
}

Next, you'll take a look at handling text.

Handling Text

In SAX, you handle text by using the characters method. This method is passed an array of characters, the location in that array where the text for the current text node starts, and the length of the text in the text node:

public void characters(char characters[], int start, int length)
{
    .
    .
    .
}

Here's how to handle the text of a text node, adding it to the displayText array:

public void characters(char characters[], int start, int length)
{
    String characterData = (new String(characters, start, length)).trim();

       if(characterData.indexOf("\n") < 0 && characterData.length() > 0) {

           displayText[numberLines] = indentation;

           displayText[numberLines] += characterData;

           numberLines++;

       }
}

By default, the SAX parser will also call a method named ignorableWhitespace when it finds whitespace text nodes, such as whitespace used for indentation. If we want to handle that text like any other text, we can simply pass it on to the characters method we just implemented (note that we've commented this line out here because we're supplying our own indentation in this example):

public void ignorableWhitespace(char characters[], int start, int length)
{
    //characters(characters, start, length);
}

Handling the End of Elements

Besides the startElement method, which is called when the SAX parser sees the beginning of an element, we can also implement the endElement method to handle an element's closing tag. Here's how that looks in this example:

public void endElement(String uri, String localName, String qualifiedName)
{
    indentation = indentation.substring(0, indentation.length() - 4);
    displayText[numberLines] = indentation;
    displayText[numberLines] += "</";
    displayText[numberLines] += qualifiedName;
    displayText[numberLines] += '>';
    numberLines++;
}

Handling Errors and Warnings

SAX makes it easy to handle warnings and errors. We can implement the warning method to handle warnings, the error method to handle errors, and the fatalError method to handle errors that the SAX parser considers fatal enough to make it stop processing. Here's what the error handling looks like in this example:

public void warning(SAXParseException exception)
{
    System.err.println("Warning: " +
        exception.getMessage());
}

public void error(SAXParseException exception)
{
    System.err.println("Error: " +
        exception.getMessage());
}

public void fatalError(SAXParseException exception)
{
    System.err.println("Fatal error: " +
        exception.getMessage());
}

And that's it—now run your new SAX code and parse ch17_01.xml like this:

%java ch17_02 ch17_01.xml

TIP

As in yesterday's discussion, depending on how you've set your Java classpath environment variable, you might have to include the current directory, which holds ch16_02.class, in order to run it. You can do that by using this at the command prompt:

set classpath=.

As shown in Figure 17.1, we've been able to read and extract all the data in the XML document by using SAX methods.

Figure 17.1 Parsing an XML document by using a SAX parser.

The complete Java code is in Listing 17.2.

Example 17.2. Parsing an XML Document by Using Java SAX (`ch17_02.java`)

import java.io.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
import org.xml.sax.helpers.DefaultHandler;

public class ch17_02 extends DefaultHandler
{
    static int numberLines = 0;
    static String indentation = "";
    static String displayText[] = new String[1000];

    public static void main(String args[])
    {
        ch17_02 parser = new ch17_02();
        parser.childLoop(args[0]);

        for(int loopIndex = 0; loopIndex < numberLines; loopIndex++){
            System.out.println(displayText[loopIndex]);
        }
    }

    public void childLoop(String uri)
    {
        DefaultHandler saxHandler = this;
        SAXParserFactory saxFactory = SAXParserFactory.newInstance();
        try {
            SAXParser saxParser = saxFactory.newSAXParser();
            saxParser.parse(new File(uri), saxHandler);
        } catch (Throwable t) {}
    }

    public void startDocument()
    {
        displayText[numberLines] = indentation;
        displayText[numberLines] += "<?xml version=\"1.0\" encoding=\""+
            "UTF-8" + "\"?>";
        numberLines++;
    }

    public void processingInstruction(String target, String data)
    {
        displayText[numberLines] = indentation;
        displayText[numberLines] += "<?";
        displayText[numberLines] += target;
        if (data != null && data.length() > 0) {
            displayText[numberLines] += ' ';
            displayText[numberLines] += data;
        }
        displayText[numberLines] += "?>";
        numberLines++;
    }

    public void startElement(String uri, String localName,
        String qualifiedName, Attributes attributes)
    {
        displayText[numberLines] = indentation;

        indentation += "    ";

        displayText[numberLines] += '<';
        displayText[numberLines] += qualifiedName;
        if (attributes != null) {
            int numberAttributes = attributes.getLength();
            for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++){
                displayText[numberLines] += ' ';
                displayText[numberLines] += attributes.getQName(loopIndex);
                displayText[numberLines] += "=\"";
                displayText[numberLines] += attributes.getValue(loopIndex);
                displayText[numberLines] += '"';
            }
        }
        displayText[numberLines] += '>';
        numberLines++;
    }

    public void characters(char characters[], int start, int length)
    {
        String characterData = (new String(characters, start, length)).trim();
        if(characterData.indexOf("\n") < 0 && characterData.length() > 0) {
            displayText[numberLines] = indentation;
            displayText[numberLines] += characterData;
            numberLines++;
        }
    }

    public void ignorableWhitespace(char characters[], int start, int length)
    {
        //characters(characters, start, length);
    }

    public void endElement(String uri, String localName, String qualifiedName)
    {
        indentation = indentation.substring(0, indentation.length() - 4);
        displayText[numberLines] = indentation;
        displayText[numberLines] += "</";
        displayText[numberLines] += qualifiedName;
        displayText[numberLines] += '>';
        numberLines++;
    }

    public void warning(SAXParseException exception)
    {
        System.err.println("Warning: " +
            exception.getMessage());
    }

    public void error(SAXParseException exception)
    {
        System.err.println("Error: " +
            exception.getMessage());
    }

    public void fatalError(SAXParseException exception)
    {
        System.err.println("Fatal error: " +
            exception.getMessage());
    }
}

< Back Page 218 of 288 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address

Sams Teach Yourself XML in 21 Days

Recommended Book

Recommended Book

Recommended Book 