Using Parsers
We have seen a previous code example that uses a SAX parser. This section presents some more code examples.
Let's look at an example in Listing 3.1.
Listing 3.1 Java Code Using a SAX Parser
import org.xml.sax.*; import org.xml.sax.helpers.*; import java.io.*; import java.util.*; import common.*; public class SAXSample extends DefaultHandler { // The following variable stores the XML string in sort of stack private Stack tagStack = new Stack(); // List of item names stored as Vector private Vector items = new Vector(); // Client name private String client; // Collecting data in the buffer from // the "characters" SAX event. private CharArrayWriter contents = new CharArrayWriter(); // Override methods of the DefaultHandler class // to receive of SAX Events. public void startElement( String namespaceURI, String localName, String qName, Attributes attr ) throws SAXException { contents.reset(); // push the tag name in the stack tagStack.push( localName ); // display the current path that has been found... System.out.println( "Path: [" + getTagPath() + "]" ); } public void endElement( String namespaceURI, String localName, String qName ) throws SAXException { if ( getTagPath().equals( "/Order/Customer/Name" ) ) { client = contents.toString().trim(); } else if ( getTagPath().equals( "/Order/Items/Item/Name" ) ) { items.addElement( contents.toString().trim() ); } // clean the stack tagStack.pop(); } public void characters( char[] ch, int start, int length ) throws SAXException { // collect the contents into a buffer. contents.write( ch, start, length ); } // Build the path string from the current stack private String getTagPath( ){ // construct path string String buffer = ""; Enumeration e = tagStack.elements(); while( e.hasMoreElements()){ buffer = buffer + "/" + (String) e.nextElement(); } return buffer; } public Vector getItems() { return items; } public String getClientName() { return client; } public static void main( String[] argv ){ System.out.println( "SAXExample:" ); try { // Start using SAX 2 parser XMLReader xr = XMLReaderFactory.createXMLReader(); // Set ContentHandler SAXExample ex1 = new SAXExample(); xr.setContentHandler( ex1 ); System.out.println(); System.out.println("Tag Paths:"); // File parsing xr.parse( new InputSource(new FileReader( "Sample.xml" )) ); System.out.println(); System.out.println("Names:"); // Display Client System.out.println( "Client Name: " + ex1.getClientName() ); // Display all item names System.out.println( "Order Items list: " ); String itemName; Vector items = ex1.getItems(); Enumeration e = items.elements(); while( e.hasMoreElements()){ itemName = (String) e.nextElement(); System.out.println( itemName ); } }catch ( Exception e ) { System.out.println( "ERROR: Stack Trace: "); e.printStackTrace(); } } }
The XML file used to test the preceding code is shown in Listing 3.2.
Listing 3.2 XML Input File for Code Example in Listing 3.1
<?xml version="1.0"?> <Order> <Customer> <Name>EvolveWare Inc</Name> <Address>Sunnyvale</Address> </Customer> <Items> <Item> <ProductCode>098</ProductCode> <Name>XML Framework</Name> <Price>1232.99</Price> </Item> <Item> <ProductCode>4093</ProductCode> <Name>XMLUI 4x</Name> <Price>90.88</Price> </Item> </Items> </Order>
Unlike a previous example, the code in Listing 3.1 includes all the handlers within one class, as seen by noting that the default class itself extends DefaultHandler. The rest of the code pushes all the retrieved elements on to a simple plain string variable. This is referred to as the stack in the code.
Let's look at the DOM example code in Java using IBM's XML DOM Parser (IBM XML4J). To begin with, consider the XML document to be parsed, as shown in Listing 3.3 (the DTD for the same in Listing 3.4 for reference).
Listing 3.3 Product XML Document
<?xml version="1.0"> <catalog category="software"> <name>EvolveWare Software Products></name> <itemlist> <item> <itemname>XML Framework</itemname> <cost>1230.34</cost> <description>Framework for financial enterprise_solution</description> <id>N12323</id> </item> </itemlist> </catalog>
Listing 3.4 DTD for XML Document in Listing 3.2
<!DOCTYPE catalog [ <!ELEMENT catalog (category, itemlist*) > <!ELEMENT category (#PCDATA) > <!ATTLIST catalog category(software|hardware|network) #REQUIRED> <!ELEMENT itemlist (itemname,cost,description,id) > <!ELEMENT itemname (#PCDATA) > <!ELEMENT cost (#PCDATA) > <!ELEMENT description (#PCDATA) > <!ELEMENT id (#PCDATA) > ]>
The DOM representation of the XML document is shown in Figure 3.1.
Figure 3.1 DOM representation of Listing 3.2.
The following needs to be imported for use by the IBM parser:
import com.ibm.xml.parsers.DOMParser; import org.w3c.dom.Document; import org.w3c.dom.NodeList; import org.w3c.dom.Element;import org.w3c.dom.NamedNodeMap;
The following code parses the source XML string and returns the DOM object pointing to it:
public static Document parseDOM(String sourceFile) throws Exception{ try { // Get a new parser and attach an error handler DOMParser objParser = new DOMParser(); // Parse the source file objParser.parse(sourceFile); // Return the document return objParser.getDocument(); } catch (Exception ex) { System.err.println("Failed with exception: " + ex); } return null; }
As shown, the code creates the instance of the parser, parses the XML source, and returns the DOM object reference to it.
The following code resides in the main() method call of the Java code: it simply calls the printXMLfromDOM() method:
if (document != null) { System.out.println("*******Print XML document from DOM Tree**************"); printXMLfromDOM(document.getDocumentElement()); } else{ System.out.println("*********In main()*****"); } }
The printXMLfromDOM() method is as follows:
private static void printXMLfromDOMt(Element element) { int k; NamedNodeMap attributes; NodeList children = element.getChildNodes(); // Start from this element System.out.print("<" + element.getNodeName()); // print attibutes inside the element start tag attributes = element.getAttributes(); if (attributes != null) { for (k = 0; k < attributes.getLength(); k++) { System.out.print(" " + attributes.item(k).getNodeName()); System.out.print("=" + attributes.item(k).getNodeValue()); } } // check for element value or sub-elements if (element.hasChildNodes()) { System.out.print(">"); // print all child elements in the DOM tree for (k = 0; k < children.getLength(); k++) { if (children.item(k).getNodeType() == org.w3c.dom.Node.ELEMENT_NODE) { printElement((Element) children.item(k)); } else if (children.item(k).getNodeType() == org.w3c.dom.Node.TEXT_NODE) { System.out.print(children.item(k).getNodeValue()); } } // for loop ends here // print end tag System.out.print("</" + element.getNodeName() + ">"); } else { // element seems to be empty System.out.print(" />"); }// else ends here }// printXMLfromDOM ends here
The preceding code is simple because it loops through the child elements and prints the entire XML document to the console.