SAX: The Power API
SAX was developed by the members of the XML-DEV mailing list as a standard and simple API for event-based parsers. SAX is short for the Simple API for XML.
SAX was originally defined for Java but it is also available for Python, Perl, C++, and COM (Windows objects). More language bindings are sure to follow. Furthermore, through COM, SAX parsers are available to all Windows programming languages, including Visual Basic and Delphi.
Currently SAX is edited by David Megginson (but he has announced that he will retire) and published at http://www.megginson.com/SAX. Unlike DOM, SAX is not endorsed by an official standardization body, but it is widely used and is considered a de facto standard.
As you have seen, in a browser DOM is preferred API. Therefore, the examples in this chapter are written in Java. If you feel you need a crash course on Java, turn to Appendix A.
Some parsers that support SAX include Xerces, the Apache parser formerly the IBM parser (available from xml.apache.org); MSXML, the Microsoft parser (available from msdn.microsoft.com); and XDK, the Oracle parser (available from technet.oracle.com/tech/xml). These parsers are the most flexible because they also support DOM.
A few parsers offer only SAX, such as James Clark's XP (available from http://www.jclark.com), Ælfred (available from home.pacbell.net/david-b/xml), and ActiveSAX from Vivid Creations (available from http://www.vivid-creations.com).
Getting Started with SAX
Listing 8.2 is a Java application that finds the cheapest offering in Listing 8.1. The application prints the best price and the name of the vendor.
Listing 8.2: Cheapest.java
/* * XML By Example, chapter 8: SAX */ package com.psol.xbe2; import org.xml.sax.*; import java.io.IOException; import org.xml.sax.helpers.*; import java.text.MessageFormat; /** * SAX event handler to find the cheapest offering in a list of * prices. */ public class Cheapest extends DefaultHandler { /** * constants */ protected static final String NAMESPACE_URI = "http://www.psol.com/xbe2/listing8.1", MESSAGE = "The cheapest offer is from {0} ({1,number,currency})", PARSER_NAME = "org.apache.xerces.parsers.SAXParser"; /** * properties we are collecting: cheapest price & vendor */ protected double min = Double.MAX_VALUE; protected String vendor = null; /** * startElement event: the price list is stored as price * elements with price and vendor attributes * @param uri namespace URI * @param name local name * @param qualifiedName qualified name (with prefix) * @param attributes attributes list */ public void startElement(String uri, String name, String qualifiedName, Attributes attributes) { if(uri.equals(NAMESPACE_URI) && name.equals("price-quote")) { String attribute = attributes.getValue("","price"); if(null != attribute) { double price = toDouble(attribute); if(min > price) { min = price; vendor = attributes.getValue("","vendor"); } } } } /** * helper method: turn a string in a double * @param string number as a string * @return the number as a double, or 0.0 if it cannot convert * the number */ protected static final double toDouble(String string) { Double stringDouble = Double.valueOf(string); if(null != stringDouble) return stringDouble.doubleValue(); else return 0.0; } /** * main() method * decodes command-line parameters and invoke the parser * @param args command-line argument * @throw Exception catch-all for underlying exceptions */ public static void main(String[] args) throws IOException, SAXException { // command-line arguments if(args.length < 1) { System.out.println("java com.psol.xbe2.Cheapest file"); return; } // creates the event handler Cheapest cheapest = new Cheapest(); // creates the parser XMLReader parser = XMLReaderFactory.createXMLReader(PARSER_NAME); parser.setFeature("http://xml.org/sax/features/namespaces", true); parser.setContentHandler(cheapest); // invoke the parser against the price list parser.parse(args[0]); // print the results Object[] objects = new Object[] { cheapest.vendor, new Double(cheapest.min) }; System.out.println(MessageFormat.format(MESSAGE,objects)); } }
Compiling the Example
To compile this application, you need a Java Development Kit (JDK) for your platform. For this example, the Java Runtime is not enough. You can download the JDK from java.sun.com.
You must also download the listings available from http://www.marchal.com or http://www.quepublishing.com. The download includes Xerces. As always, I will post updates, if appropriate, on the Web site.
If you have problems with a listing, make sure you visit http://www.marchal.com or http://www.quepublishing.com.
Save Listing 8.2 in a file called Cheapest.java. Go to the DOS prompt, change to the directory where you saved Cheapest.java and compile by issuing the following commands at the DOS prompt:
mkdir classes set classpath=classes;lib\xerces.jar javac -d classes src\Cheapest.java
The compilation will install the Java program in the classes directory. These commands assume that you have installed Xerces in the lib directory and Listing 8.2 in the src directory. You might have to adapt the classpath (second command) if you installed the parser in a different directory.
To run the application against the price list, issue the following command:
java com.psol.xbe2.Cheapest data\pricelist.xml
CAUTION
Be warned that Java has difficulty with paths containing spaces. If Cheapest complains that it cannot find the file, check that the directory does not contain a space somewhere.
The result should be
The cheapest offer is from XMLi ($699.00)
This command assumes that Listing 8.1 is in a file called data\pricelist.xml. Again, you might need to adapt the path to your system.
CAUTION
The programs in this chapter do essentially no error checking. It simplifies them and helps concentrate on the XML aspects. It also means that if you type incorrect parameters, they crash.
Remember that you cannot compile this example unless you have installed a Java Development Kit.
Finally, an error such as
src\Cheapest.java:7: Package org.xml.sax not found in import.import org.xml.sax.*;
or
Can't find class com/psol/xbe2/Cheapest or something it requires
is most likely one of the following:
The classpath (second command, classes;lib\xerces.jar) is incorrect.
-
You entered an incorrect class name in the last command (com.psol.xbe2.Cheapest).
The Event Handler Step by Step
Events in SAX are defined as methods attached to specific Java interfaces. In this section, we will review Listing 8.2 step by step. The following section gives you more information on the main SAX interfaces.
The easiest solution to declare an event handler is to inherit from the SAX-provided DefaultHandler:
public class Cheapest extends DefaultHandler
This application implements only one event handler: startElement() which the parser calls when it encounters a start tag. The parser will call startElement() for every start tag in the document: <xbe:price-list>, <xbe:product> and <xbe:price-quote>.
In Listing 8.2, the event handler is only interested in price-quote, so it tests for it. The handler does nothing with events for other elements:
if(uri.equals(NAMESPACE_URI) && name.equals("price-quote")) { // ... }
TIP
Note that this is an event handler. It does not call the parser. In fact, it's just the opposite: the parser calls it.
If you're confused, think of AWT events. An event handler attached to, say, a button does not call the button. It waits for the button to be clicked.
When it finds a price-quote element, the event handler extracts the vendor name and the price from the list of attributes. Armed with this information, finding the cheapest product is a simple comparison:
String attribute = attributes.getValue("","price"); if(null != attribute) { double price = toDouble(attribute); if(min > price) { min = price; vendor = attributes.getValue("","vendor"); } }
Notice that the event handler receives the element name, namespace and attribute lists as parameters from the parser.
Let's now turn our attention to the main() method. It creates an event- handler object and a parser object:
Cheapest cheapest = new Cheapest(); XMLReader parser = XMLReaderFactory.createXMLReader(PARSER_NAME);
XMLReader and XMLReaderFactory are defined by SAX. An XMLReader is a SAX parser. The factory is a helper class to create XMLReaders.
main() sets a parser feature to request namespace processing and it registers the event handler with the parser. Finally, main() calls the parse() method with the URI to the XML file:
parser.setFeature("http://xml.org/sax/features/namespaces",true); parser.setContentHandler(cheapest); parser.parse(args[0]);
TIP
It is not required to set http://xml.org/sax/features/namespaces to true because the default value is true. However, I find it makes the code more readable.
The innocent-looking parse() method triggers parsing of the XML document which, in turn, calls the event handler. It is during the execution of this method that our startElement() method will be called. There's a lot happening behind the call to parse()!
Last but not least, main() prints the result:
Object[] objects = new Object[] { cheapest.vendor, new Double(cheapest.min) }; System.out.println(MessageFormat.format(MESSAGE,objects));
Wait! When do Cheapest.vendor and Cheapest.min acquire their values? We don't set them explicitly in main()! True; it's the event handler job. And the event handler is ultimately called by parse(). That's the beauty of event processing.