Using HC
With regular SAX parsing, you write a ContentHandler. The parser fires events on the ContentHandler as it decodes the document. HC retains the event-driven approach, but uses more abstract application handlers.
As the name indicates, application handlers contain application code only. No stacks, no state tracking. Application handlers associate events to XPaths. HC uses a precompiler to generate the state tracking code automatically from the list of XPaths. Automatically is the operative word here.
Application Handler Example
Let's illustrate HC programming with an example (available in the HC distribution). Suppose you want to format the following XML document to HTML:
<?xml version="1.0"?> <article xmlns="http://ananas.org/2002/docbook"> <articleinfo> <title>HC</title> </articleinfo> <para><ulink uri="http://www.ananas.org/hc">HC</ulink> unlocks the fun in SAX parsing.</para> <para>HC is distributed as open source from <ulink uri="http://www.ananas.org">ananas.org</ulink>. </para> </article>
With HC, you can write the following application handler:
package org.ananas.hc.test; import java.io.*; import org.xml.sax.*; import org.ananas.hc.*; import org.xml.sax.helpers.*; /** * @xmlns http://ananas.org/2002/docbook */ public class DocbookHandler implements org.ananas.hc.HCHandler { protected PrintWriter writer; protected String title; public DocbookHandler() { this(null); } public DocbookHandler(PrintWriter writer) { if(writer == null) this.writer = new PrintWriter(new OutputStreamWriter(System.out)); else this.writer = writer; } /** * @xpath / */ public void startHTML() { title = null; writer.println("<html>"); } /** * @xpath / */ public void endHTML() { writer.println("</body>"); writer.print("</html>"); writer.flush(); } /** * @xpath articleinfo */ public void endInfo() { writer.println("<body>"); } /** * @xpath para */ public void startPara() { writer.print("<p>"); } /** * @xpath para */ public void endPara() { writer.println("</p>"); } /** * @xpath para * @xpath ulink */ public void characters(char[] ch,int offset,int len) { writer.write(ch,offset,len); } /** * @xpath articleinfo/title */ public void charactersTitle(String title) { this.title = title; } /** * @xpath ulink */ public void startULink(Attributes atts) { writer.print("<a href='"); writer.print(atts.getValue("uri")); writer.print("'>"); } /** * @xpath ulink */ public void endULink() { writer.print("</a>"); } /** * @xpath articleinfo/title */ public void endArticleTitle() { writer.print("<header><title>"); writer.print(title); writer.println("</title></header>"); writer.print("<h1>"); writer.print(title); writer.println("</h1>"); } public static void main(String[] params) { try { XMLReader xparser = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); if(params.length > 1) { PrintWriter writer = new PrintWriter(new FileWriter(params[1])); xparser.setContentHandler(new XPathHandler( new DocbookHandler(writer))); } else xparser.setContentHandler(new XPathHandler( new DocbookHandler())); xparser.parse(new InputSource(params[0])); } catch(SAXException e) { if(e.getException() != null) e.getException().printStackTrace(); else e.printStackTrace(); } catch(IOException e) { e.printStackTrace(); } } }
Package Imports
Let's review this listing step-by-step. The listing imports the usual SAX packages but it also needs the HC package:
import org.ananas.hc.*;
Namespace Declarations
Next the application handler declares zero, one or more XML namespaces. HC uses special Javadoc-like comments to declare namespaces and XPaths. Using Javadoc-like comments to annotate the listing means that you do not need to learn a new programming language.
/** * @xmlns http://ananas.org/2002/docbook */
The previous comment declares the default namespace (no prefix). To associate a prefix with the namespace, simply add the prefix before the namespace URI. Of course, you can declare several namespaces in one comment:
/** * @xmlns http://ananas.org/2002/docbook * @xmlns ps http://psol.com/2002/extensions */
Application Handler
The application handler implements the HCHandler interface. The interface defines no methods, but serves as a flag for the HC precompiler:
public class DocbookHandler implements org.ananas.hc.HCHandler
XPath Annotations
Finally, several methods in the application handler are annotated with more Javadoc-like comments:
/** * @xpath articleinfo/title */ public void charactersTitle(String title) { this.title = title; }
Note that version 0.4 of HC supports a small subset of the XPath language. Specifically, it only accepts element names and the / separator. Conditions (para[last()]), attributes (ulink/@uri), and much more are not recognized. Future versions should be more powerful.
As you can imagine, HC calls the method after reading the corresponding XPath. A SAX parser offers start, end, and characters events. Likewise, HC can attach up to three methods to each XPath:
Start events take the form of startXXX() methods
End events take the form of endXXX() methods
Characters events take the form charactersXXX() methods
As an added convenience, HC gives you lots of control over the parameters you use. You need to specify only the parameters you actually use. In practice, most methods associated with start or end events need no parameters, but they accept the usual namespaceURI, localName, and attributes.
Likewise, as illustrated in the above listing, methods for characters events accept a string or a character array.
Calling HC
In the above listing, the main method illustrates calling HC. The application creates a SAX XMLReader, and sets its ContentHandler to HC-provided XPathHandler. The XPathHandler constructor takes an application handler as parameter. Next, it invokes parse() as usual:
xparser.setContentHandler(new XPathHandler( new DocbookHandler())); xparser.parse(new InputSource(params[0]));
That's almost it! As you can see, an HC application handler replaces state tracking code with XPath annotations. That lets the programmer concentrate on his or her application logic, not on the specifics of SAX parsing.
Building the Project with ANT
Compiling an HC project is two-step process. First, you run the HC precompiler to create so-called tables classes. Table classes hold information that the HC run-time needs to recognize XPaths. Next, you compile the project with a regular Java compiler.
I recommend using ANT to automate the two-step process. If you are not familiar with it, ANT is handy to automate a Java build process. It is similar to the popular make utility, but better suited for Java programmers.
The following is an ANT 1.4.1 build file that compiles the previous project:
<?xml version="1.0"?> <!-- Ant 1.4.1 build file: jakarta.apache.org --> <project name="HC" default="build" basedir="."> <property name="classpath" value="lib/xerces.jar;lib/junit.jar;lib/hc.jar"/> <target name="build" depends="precompile"> <javac destdir="classes" classpath="${classpath}"> <src path="src"/> <src path="autosrc"/> </javac> </target> <target name="precompile" depends="prepare"> <javadoc sourcepath="src" packagenames="*" classpath="${classpath}"> <doclet name="org.ananas.hc.compiler.CompilerDoclet" path="${classpath}"> <param name="-d" value="autosrc"/> </doclet> </javadoc> </target> <target name="prepare"> <mkdir dir="classes"/> <mkdir dir="autosrc"/> </target> </project>
To use this file, open a console, switch to the directory where you installed the package, and issue the ant command.
The precompile step is unique to HC projects. It invokes the Javadoc compiler with the HC doclet. HC uses Javadoc as its underlying compiler. The -d parameter specifies the output directory.
<target name="precompile" depends="prepare"> <javadoc sourcepath="src" packagenames="*" classpath="${classpath}"> <doclet name="org.ananas.hc.compiler.CompilerDoclet" path="${classpath}"> <param name="-d" value="autosrc"/> </doclet> </javadoc> </target>