XML Reference Guide

Mar 14, 2003

␡

⎙ Print

< Back Page 42 of 278 Next >

The Simple API for XML, or SAX, was developed by the XML-DEV mailing list. Rather than treating an XML document as a tree-like structure, SAX treats it as a series of events such as startDocument or endElement. To accomplish this, a SAX appllication typically consists of three classes.

The first class is the main class, which comprises the actual application and the main() method. This class causes events to be sent to the ContentHandler class, which acts on them. If the parser encounters a problem with the XML file, a warning, error, or fatalError goes to the ErrorHandler class.

Creating a SAX application involves processing events as they arrive, keeping in mind that the ContentHandler knows only about the current event; if you need information about previous events, you need to save it yourself. For example, consider this order file:

<?xml version="1.0"?>
<order orderid="THX1138" customerNumber="3263827">
    <lineitem itemid="C33">
       <item>3/4" Hex Bolt</item>
       <quantity>36</quantity>
       <unitprice currency="dollars">.35</unitprice>
    </lineitem>
    <lineitem itemid="M48">
       <item>Condenser</item>
       <quantity>1</quantity>
       <unitprice currency="dollars">2200</unitprice>
    </lineitem>
    <delivery>Overnight</delivery>
</order>

We can create a SAX application that lists the order information, including the extended total for each item and the grand total for the order. We'd start by creating the main SAX application:

import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.XMLReader;
import org.xml.sax.SAXException;
import org.xml.sax.InputSource;
import java.io.IOException;

public class OrderInfo {

    public static void main (String[] args){
    
       try {
       
          String parserClass = "org.apache.crimson.parser.XMLReaderImpl";
          XMLReader reader = XMLReaderFactory.createXMLReader(parserClass);

          reader.setContentHandler(new OrderProcessor());
          reader.setErrorHandler(new ErrorProcessor());

          InputSource file = new InputSource("order.xml");
          reader.parse(file);

       } catch (IOException ioe) {
          System.out.println("IO Exception: "+ioe.getMessage());
       } catch (SAXException se) {
          System.out.println("SAX Exception: "+se.getMessage());
       }     
    }

}

SAX is designed to enable the easy substitution of one parser for another, so the application starts by defining the actual class that will act as the parser for the document. The XMLReaderFactory uses this class to create the actual XMLReader, which does the analysis of the file and sends the events to the ContentHandler (or the ErrorHandler, in the case of warnings and errors). The XMLReader then uses setter methods to determine the classes used for these purposes.

Finally, the XMLReader parses the InputSource, created from the URI of a local or remote file.

The ContentHandler itself is a more complex, but let's take it one step at a time.

import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;

public class OrderProcessor extends DefaultHandler
{
   public OrderProcessor ()
   {
      super();
   }

   double totalPrice = 0;

   public void startDocument() {
       totalPrice = 0;
   }

   public void endDocument() {
      System.out.println("Order total: "+totalPrice);
   }


}

Now, a ContentHander class has all sorts of methods to implement, such as startDocument() and endDocument(), so SAX provides the DefaultHandler class, with empty implementations of all of the methods. By extending DefaultHandler, you can simply override the methods you need. In this case, we're going to gather information about the order, so we'll start by initializing the totalPrice variable when parsing starts, and displaying its value when its done.

The startDocument() and endDocument() events fire just once, but other events fire multiple times. For example, the first events in the sample document are:

startDocument
characters (white space)
startElement (lineitem)
characters (white space)
startElement (item)
characters (3/4" Hex Bolt)
endElement (item)
characters (white space)
startElement (quantity)
characters (36)
endElement (quantity)
...

Now, it's important to understand that each of these events are completely independent of each other. When the characters event fires to note the 3/4" Hex Bolt -- more on the characters() method in a moment -- the handler has no way of knowing that that text is part of the item element. If this information is important (as it is here) we need to keep track of it ourselves.

For our purposes, that means that when we close an element we're tracking, such as item or quantity, we need to store the text that's been flowing through the characters() method, like so:

import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;

public class OrderProcessor extends DefaultHandler
{
   public OrderProcessor ()
   {
      super();
   }

   String itemid = "";
   String itemname = "";
   String quantity = "0";
   String unitprice = "0";
   int quantityInt = 0;
   double unitpriceDbl = 0;
   double totalPrice = 0;

   String currentElement = "";

   StringBuffer thisText = new StringBuffer();

   public void startDocument() {
       totalPrice = 0;
   }

   public void endDocument() {
      System.out.println("Order total: "+totalPrice);
   }

   public void startElement (String namespaceUri, String localName,
		            String qualifiedName, Attributes attributes)
   {

      if (localName.equals("order")){
          String orderid = attributes.getValue("orderid");
          String customerid = attributes.getValue("customerNumber");
          System.out.println("Order "+orderid+" for customer "+customerid);
      } else if (localName.equals("lineitem")){
          itemid = attributes.getValue("itemid");  
      }

      currentElement = localName;

   }

   public void endElement (String namespaceUri, String localName,
		            String qualifiedName) throws SAXException
   {

      if (thisText.length() > 0) {
         if (localName.equals("item")){
            itemname = thisText.toString().trim();
         } else if (localName.equals("quantity")){
            quantity = thisText.toString().trim();
            quantityInt = new Integer(quantity).intValue();
         } else if (localName.equals("unitprice")){
            unitprice = thisText.toString().trim();
            unitpriceDbl = new Double(unitprice).doubleValue();
         }  
         thisText.delete(0, thisText.length());
      }

      if (localName.equals("lineitem")){
         double extendedPrice = quantityInt * unitpriceDbl;
         System.out.println("Item: "+itemname+" ("+itemid+") "+quantity+
                                         " @ "+unitprice+" = "+extendedPrice);
         totalPrice = totalPrice + extendedPrice;
         itemname = "";
         quantity = "";
         quantityInt = 0;
         unitprice = "";
         unitpriceDbl = 0;
      }
   }

   public void characters (char ch[], int start, int length)
   {
       thisText.append(ch, start, length);
   }

}

Let's start with startElement(). If it's the order element or the lineitem element we've run across, we're pulling the appropriate information from the attributes present. In any case, we're storing the name of the element.

In most cases, the next event that will fire is the characters() event, as the content of the element is processed. One thing that's a little strange about SAX is that you never really know just how text will be processed. You might get it all in one big chunk, or you might get it in a series of smaller pieces. Because of this little idiosyncrasy, we need to store each call in the thisText StringBuffer. When we get to the end of the element, the endElement() method executes, and we can check (and clear) the contents of the StringBuffer.

Note that our method of saving the "current" element only works because we're only looking for the text children of simple elements. If we needed to track multiple levels of elements, we'd have to find another way (or use another way of parsing the document, such as DOM). In this case, though, it's sufficient, so as each element closes, we check to see what it was and perform the appripriate actions. If it was an item, quantity, or unitprice element, we simply store the appropriate values. If, on the other hand, its the end of a lineitem element, we perform the appropriate calculations, display the information for that item, and reinitialize the variables.

Executing the OrderInfo application displays a result of

Order THX1138 for customer 3263827
Item: 3/4" Hex Bolt (C33) 36 @ .35 = 12.6
Item: Condenser (M48) 1 @ 2200 = 2200.0
Order total: 2212.6

The ErrorHandler can be complex, or it can be simple, as it is here, simply parotting any error messages:

import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.SAXParseException;

public class ErrorProcessor extends DefaultHandler
{

    public ErrorProcessor ()
    {
	    super();
    }

    public void error (SAXParseException e) {
        System.out.println("Error: "+e.getMessage());
    }

    public void fatalError (SAXParseException e) {
        System.out.println("Fatal Error: "+e.getMessage());
    }

    public void warning (SAXParseException e) {
        System.out.println("Warning: "+e.getMessage());
    }

}

Overall, this is a fairly simple explanation of what SAX can do; there are a significant number of events we haven't even touched on. In addition, SAX enables you to "chain" handlers together in order to "filter" the events in the stream for some extremely powerful and flexible applications.

SAX is, in many cases, faster and more efficient than DOM, because it only deals with the information that's relevant at that particular moment rather than keeping the entire tree in memory at once. It may take a little getting used to, but you'll find that it can be an extremely versatile item in your toolbox.

< Back Page 42 of 278 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address