The ZwiftBooks Filter Solution
Let’s use SAX filters to help ZwiftBooks maximize their reuse of existing code. Figure 2 illustrates our problem. The company already has in place a SAX implementation that alerts the warehouse whenever an isbn attribute is located within incoming XML documents. Now, a new arrangement has been made to provide services to an Asian book purchaser. The only problem is that the XML uses the attribute asin and includes XX as a prefix to every ISBN. Since the attribute name isn’t isbn, it will slip by our handler. What we need to do is change the attribute name from asin to isbn and strip out the leading XX—a fine task for a SAX filter. By creating a SAXDataFilter that changes the attribute name and value, we can pass the conforming XML to our original handler without changing existing code.
Figure 2 Using a SAX filter to modify the XML stream.
To make working with attributes fairly painless, SAX2 provides an AttributesImpl class that implements the SAX2 Attributes interface with manipulators so that attributes can be modified or reused. The AttributesImpl class can be used in two different ways:
- Take a persistent snapshot of an Attributes object in a startElement event.
- Construct or modify an Attributes object in a SAX2 driver or filter.
We use this class to modify both the asin attribute name and its value in the filter example in Listing 2.
Listing 2 SAX handler code for a SAX filter that modifies attribute name and value.
1. import org.xml.sax.helpers.XMLFilterImpl; 2. import org.xml.sax.helpers.AttributesImpl; 3. import org.xml.sax.Attributes; 4. import org.xml.sax.SAXException; 5. 6. 7. 8. public class SaxDataFilter extends XMLFilterImpl { 9. 10. public void startElement (String namespaceUri, String localName, 11. String qualifiedName, 12. Attributes attributes) 13. throws SAXException { 14. 15. AttributesImpl newAttributes = null; // set up local var 16. 17. // check to see which element we’re looking at 18. if (qualifiedName.equals("book")) { 19. 20. // copy all the attributes for this element into a new 21. // structure that allows us to retrieve and modify attributes 22. newAttributes = new AttributesImpl(attributes); 23. 24. String asiabookcode = newAttributes.getValue("asin"); 25. 26. // remove leading two characters from the book code 27. String isbnCode = asiabookcode.substring(2); 28. 29. // determine the index of the incoming asin attribute 30. int idx = newAttributes.getIndex("asin"); 31. 32. // change both the attribute name and the value 33. newAttributes.setQName(idx, "isbn"); 34. newAttributes.setValue(idx, isbnCode); 35. 36. } 37. 38. // pass original parameter data (except for attributes) 39. // to the next pipeline - note that we are passing newAttributes 40. 41. super.startElement(namespaceUri, localName, 42. qualifiedName, newAttributes); 43. } 44. 45. 46. }
We define newAttributes on line 15; on line 22 we instantiate it with a copy of all incoming attributes. We then extract the value of the asin attribute (line 24) and strip off the leading XX (line 27). With our new isbn value in hand, we change the name and the value (lines 33–34). Then, in line 41, we pass all the parameters we received to the next handler in the chain by calling super.startElement(..). Note that we’re not passing the original attributes parameter but rather newAttributes, the instance of AttributesImpl that we created and modified.
Listing 3 shows the SaxIsbnHandler, which remains unchanged from the previous article. It now sits at the end of the filter chain and reacts to incoming ISBNs.
Listing 3 A basic SAX handler that alerts the warehouse when an incoming ISBN is found.
public class SaxIsbnHandler extends DefaultHandler { public void startElement(String namespaceUri, String localName, String qualifiedName, Attributes attributes) throws SAXException { String isbnNumber = null; // if we have a <book>, retrieve the attributes if (qualifiedName.equals("book")) { int numAttributes = attributes.getLength(); if (numAttributes > 0) { // step through each attribute, looking for isbn for(int i=0; i<numAttributes; i++) { if (attributes.getQName(i).equals("isbn") ) { isbnNumber = attributes.getValue(i); alertWarehouse( isbnNumber ); break; } } // end for } } // end if book } }
Now let’s pull it all together. Listing 4 shows how the filter chain is constructed.
Listing 4 SAX filter chain construction.
1. import javax.xml.parsers.*; 2. import org.xml.sax.*; 3. import org.xml.sax.helpers.*; 4. 5. public class SaxIsbnFilterMain { 6. 7. public static void main (String [] args) { 8. 9. String filename = "AsiaBooks.xml"; 10. 11. // define the two parts of the filter chain 12. DefaultHandler isbnHandler = new SaxIsbnHandler(); // original 13. SaxDataFilter isbnFilter = new SaxDataFilter(); // filter 14. 15. try { 16. 17. XMLReader reader = 18. XMLReaderFactory.createXMLReader( 19. "org.apache.crimson.parser.XMLReaderImpl"); 20. 21. // register the dummy reader with the filter 22. isbnFilter.setParent(reader); 23. 24. // warning: this method is not what it seems. 25. // it sets the content handler of the parent reader! 26. isbnFilter.setContentHandler(isbnHandler); 27. 28. // XML Data Source 29. InputSource inputSource = new InputSource(filename); 30. 31. // start the pipeline rolling - tell filter to parse 32. isbnFilter.parse(inputSource); 33. 34. 35. } catch(Exception e) { 36. String errorMessage = 37. "Error parsing " + filename + ": " + e; 38. System.err.println(errorMessage); 39. e.printStackTrace(); 40. } 41. } 42. 43. }
Setting up the filter chain correctly is the trickiest thing about using SAX filters, since the mechanism is not as intuitive as we might like, so let’s walk through the process step by step:
- Line 12 creates an instance of the existing handler—the one that alerts the warehouse when an ISBN is found.
- Line 13 creates the filter instance—the one that changes isbn.
- Line 17 creates a "blank" XMLReader instance using the XMLReaderFactory.
- Line 18 tells the filter to set the blank XMLReader as parent.
- Line 26 sets our isbnHandler as the ContentHandler of the filter. But wait a minute. Doesn’t our filter already have handler code? Isn’t that what Listing 2 is all about? Yes, but what’s happening is that XMLFilterImpl tells the dummy reader we just created to use the code we wrote in Listing 1 as its handler, and now the original code defined as isbnHandler (Listing 3) becomes the filter’s handler.
- Line 32 is where the parsing starts—with our isbnFilter. The parse method of XMLFilterImpl calls the parent handler first (Listing 2), which does its filtering and then makes a call to its superclass, super.startElement( ). This in turn triggers the isbnHandler code now set as the handler code for the filter. Whew!
If you find this implementation confusing and convoluted, you’re in good company. In his book XML in a Nutshell (O’Reilly, 2004), Elliotte Rusty Harold wonders out loud why he has hard time getting SAX filters right. In Java & XML (O’Reilly, 2001), Brett McLaughlin includes a similar comment. However, once you have a working example to study, it’s possible to do some powerful things with SAX filters.