- The Packages in JAXP for Using SAX
- The Key JAXP Classes and Interfaces for SAX Support
- The JAXP and Reference Implementation JAR Files
- Creating a SAX-Parsing Application
- Summary
Creating a SAX-Parsing Application
Now that the JAXP APIs are set up, let's begin the task of creating an application that uses a SAX parser to parse an XML file.
Creating the CarParts.xml File
In this chapter, an XML file that describes the parts of a car will be used as an example. The XML file will be named CarParts.xml. To create it, open the text editor of your choice, copy the following code, and save the file as CarParts.xml:
<?xml version='1.0'?> <!-- XML file that describes car parts --> <carparts> <engine> </engine> <carbody> </carbody> <wheel> </wheel> <carstereo> </carstereo> </carparts>
Before proceeding to write the application code, let's briefly examine the XML file. The first line
<?xml version='1.0'?>
identifies the file as an XML document.
The next line
<!--XML file that describes car parts -- >
is a comment line that describes the XML document.
The root element of the XML file is carparts and engine; carbody, wheel, and carstereo are the child elements contained within the carparts element.
As is evident, this is a very basic XML file. We will be enhancing the XML file as we progress through the chapter. Now that the XML file is created, let's go over the sequence of steps to create an application that uses a SAX parser.
Using a SAX Parser
To use a SAX parser, you must follow these steps:
Import the JAXP classes.
Get an instance of the SAXParserFactory class.
Using the instances of the SAXParserFactory class and the SAXParser class, generate an instance of the parser. The parser wraps an implementation of the XMLReader interface, which provides the several parse() methods that can be used to parse the XML document.
Get the XMLReader from the SAXParser instance.
In the XMLReader, register an instance of the event-handling classes.
Provide the XML document to parse.
While the parsing is in process, the XMLReader will invoke the callback methods for different parsing events, such as the start of an element, processing instructions, and so on. These callback methods are defined in the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces. Normally, extending the DefaultHandler class is the most convenient way to use the methods of these interfaces.
Importing the JAXP Classes
Let's begin writing the application code that parses the CarParts.xml file. We will call the application MyXMLHandler.java. MyXMLHandler is a simple application that parses the CarParts.xml file and displays the XML structure on the command window.
The first step is to import the classes necessary for the application to access the JAXP and SAX APIs. In the MyXMLHandler.java file, add the following lines:
import javax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*;
The javax.xml.parsers package defines the JAXP APIs, the org.xml.sax package defines the interfaces used for the SAX parser, and the org.xml.sax.helpers package defines the helper classes that have the default implementations of the interfaces defined in the org.xml.sax package.
Extending the DefaultHandler Class
Next, write the class declaration by extending the DefaultHandler class and enter the main() method. To do so, enter the following lines:
public class MyXMLHandler extends DefaultHandler { static public void main(String[] args) throws Exception { }
After extending the DefaultHandler class, you need to override the required ContentHandler interface methods.
Setting Up the SAX Parser
Now you will write the code to set up the SAX parser. To do so, add the lines of code in the main() method in Listing 3.1.
Listing 3.1 Setting Up the SAX Parser
/*Create a SAX Parser Factory*/ SAXParserFactory parseFactory = SAXParserFactory.newInstance(); /*Obtain a SAX Parser */ SAXParser saxParser = parseFactory.newSAXParser(); /*XML Reader is the interface for reading an XML document using callbacks*/ XMLReader xmlReader = saxParser.getXMLReader(); /*Attach ContentHandler - the callbacks like startDocument,startElement etc. are overridden by the setContentHandler to trap them into user code*/ xmlReader.setContentHandler(new MyXMLHandler()); /*Parse an XML document - the document is read and overridden callbacks in the MyXMLHandler are invoked*/ xmlReader.parse("CarParts.xml");
First, we get an instance of the SAXParserFactory. The actual parser class that gets loaded depends on the value of the javax.xml.parsers.SAXParserFactory system property. In this case, the Xerces parser will be loaded, because that's the default JAXP reference implementation available with the JWSDP download.
Next, using the newSAXParser() method of the SAXParserFactory instance, get the SAXParser. From the SAXParser instance, the XMLReader is obtained by using the getXMLReader() method. The XMLReader provides the methods for parsing the XML document and generating events during the document parse. To handle the content events, pass an instance of the class that implements the ContentHandler interface using the setContentHandler() method. In this case, it will be the instance of the application itself, because it extends the DefaultHandler class. Next, using the parse() method of the XMLReader, the parser is informed of which XML file to parse.
Handling the ContentHandler Events
Once the parsing starts, events are generated whenever a valid XML syntax is found, such as the starting of an element, and the methods defined in the ContentHandler interface are called. To get any meaningful output, you need to override the required methods of the ContentHandler interface.
To override the methods of the ContentHandler interface, add the following lines in Listing 3.2 to the application. These should be placed outside the main() method.
Listing 3.2 Overriding ContentHandler Methods
public void startDocument() { System.out.println("\n Start Document: -----Reading the document CarParts.xml with MyXMLHandler----\n"); } public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes elementAttributes) { System.out.println("Start Element-> "+qualifiedName); } public void endElement(String namespaceURI, String localName ,String qualifiedName) { System.out.println("End Element-> "+qualifiedName); } public void endDocument() { System.out.println("\n End Document: ----------------Finished Reading the document---------------\n"); }
The startDocument() method is called when the parsing of the CarParts.xml file starts. The startElement() method is invoked whenever the parser finds <, representing the start of an element. In the CarParts.xml file, it will be called for each element, such as carparts, engine, and so on. Similarly, the endElement() method is invoked whenever the parser finds >.
Note that even though both the XML declaration and the comment start with <, they are not treated as elements. This is because the default behavior of the parser is to ignore both the XML declaration and comments. You can access the comments by implementing the LexicalHandler interface. This is covered in Chapter 4.
When the parsing completes (when the parser moves to the line after </carparts>), the end-document event is generated. This causes the endDocument() method to be called.
You have now successfully created an application that uses the implementation of a SAX parser (Xerces, in this case) to parse an XML document and process the data.
NOTE
The code discussed here is available in the example01 folder. This folder also contains the sample CarParts.xml file.
You can now compile the program by giving the following command in the command prompt:
javac MyXMLHandler.java
Run the program. The output should be similar to Listing 3.3.
Listing 3.3 Output of MyXMLHandler with ContentHandler
Start Document: -----Reading CarParts.xml with MyXMLHandler------ Start Element-> carparts Start Element-> engine End Element-> engine Start Element-> carbody End Element-> carbody Start Element-> wheel End Element-> wheel Start Element-> carstereo End Element-> carstereo End Element-> carparts End Document: ----------------Finished Reading the document---------------------
Handling Errors
There are times when the SAX parser runs into trouble while trying to parse an XML file. The reasons can be varied, ranging from the XML document being malformed, to inability in creating the parser itself because of some system problem, such as a missing class file, and so on.
In such cases, the parser generates an error. The error can be of three types: a fatal error, an error, and a warning.
A fatal error occurs when the parser is unable to continue the parsing of the XML document. An error occurs when the XML document fails the validity constraint, such as the presence of an invalid tag. A warning is generated when there is a problem, which although not illegal in XML, is something that might have been done inadvertently. Errors and warnings are generated only if there is a DTD and you are using a validating parser. For example, a tag might have been defined twice in the DTD. In any case, the parser generates one of the following exceptions:
SAXException
SAXParseException
ParserConfigurationException
SAXException and SAXParseException are generated by the callback methods when the parser finds an error in the XML file. The SAXParserFactory class generates the ParserConfigurationException if it fails to create a parser.
To handle such errors, you need to provide an exception handling mechanism. Creating a class that implements the ErrorHandler interface and registering that class with the XMLReader does this. The ErrorHandler interface has three methods: fatalError(), error(), and warning(), which handle all possible error scenarios within themselves.
You will now create an error condition in the XML file and add error-handling code to the application. To create the error condition, replace the </engine> line from the CarParts.xml file with <This_is_invalid_xml_data>.
After making the changes, the modified CarParts.xml should appear as in Listing 3.4:
Listing 3.4 CarParts.xml with Errors
<?xml version='1.0'?> <!-- Invalid XML file that describes car parts --> <carparts> <engine> </This_is_invalid_xml_data> <carbody> </carbody> <wheel> </wheel> <carstereo> </carstereo> </carparts>
Next, the application needs to be updated with the error-handling code:
Create a class that implements the ErrorHandler interface. For our example, we will call the class MyErrorHandler.
Register the class with the XMLReader by using the setErrorHandler() method.
Put a try-catch block around the parse() method to catch the exceptions. You need to catch the SAXExceptions and SAXParseExceptions to handle all parsing errors.
To create the class that implements the ErrorHandler interface, add the lines shown in Listing 3.5.
Listing 3.5 Implementing ErrorHandler
public class MyXMLHandler extends DefaultHandler{ public static void Main(String[] args) throws exception { ........... } static class MyErrorHandler implements ErrorHandler { public void fatalError(SAXParseException saxException) { System.out.println("Fatal Error occurred "+ saxException); } public void error(SAXParseException saxException) { System.out.println("Error occurred "+ saxException); } public void warning(SAXParseException saxException) { System.out.println("warning occurred "+ saxException); } } } //End of MyXMLHandler
You created an internal static class MyErrorHandler, which implements the ErrorHandler interface. The three methods of the ErrorHandler interface will now handle all types of errors that are generated while parsing an XML document.
Next, the class is to be registered with the XMLReader. To do so, add the following lines in bold to the application:
/*XML Reader is the interface for reading an XML document using callbacks*/ XMLReader xmlReader = saxParser.getXMLReader(); /*set the error handler*/ xmlReader.setErrorHandler(new MyErrorHandler()); xmlReader.setContentHandler(new MyXMLHandler());
Finally, the try-catch block needs to be put around the parse() method. To do so, add the lines of code displayed in Listing 3.6.
Listing 3.6 Adding the try-catch Block
xmlReader.setContentHandler(new MyXMLHandler()); try{ /*Parse an XML document - the document is read and overridden callbacks in the MyXMLHandler are invoked*/ xmlReader.parse("CarParts.xml"); } catch (SAXParseException saxException) { /* If there are errors in XML data are trapped and location is displayed*/ System.out.println("\n\nError in data.xml at line:" +saxException.getLineNumber()+ "("+saxException.getColumnNumber()+")\n"); System.out.println(saxException.toString()); } catch (SAXException saxEx) { /* If there are errors in XML data, the detailed message of the exception is displayed*/ System.out.println(saxEx.getMessage()); } } //end of main method
Now the application is capable of handling all the possible types of errors that can be generated while parsing an XML document.
NOTE
The code discussed here is available in the example02 folder. This folder also contains the sample CarParts.xml file.
Compile and run the application. The output should be similar to the following:
-----Reading the document data.xml with MyXMLHandler------ Start Element-> carparts Start Element-> engine Fatal Error occurred org.xml.sax.SAXParseException: Expected "</engine>" to terminate element starting on line 5. Error in data.xml at line:6(-1) org.xml.sax.SAXParseException: Expected "</engine>" to terminate element starting on line 5.
As displayed in the listing, when the parser gets a fatal error, it throws a SAXParseException and calls the fatalError() method.
So far, the parsing application can process the elements in the XML file and handle the parsing errors. Next, you will enable the application to handle element attributes.
Handling Attributes
The existing CarParts.xml file does not have any element attributes. To add attributes, update the XML file as in Listing 3.7. The element attributes that are to be added are displayed in bold.
Listing 3.7 Adding Element Attributes
<?xml version='1.0'?> <!-- XML file that descibes car parts --> <carparts> <engine type="Alpha37" capacity="2500" price="3500"> </engine> <carbody type="Tallboy" color="blue"> </carbody> <wheel type="X3527" price="120"> </wheel> <carstereo manufacturer="MagicSound" model="T76w" Price="500"> </carstereo> </carparts>
After updating the XML file, update the application code to handle the attributes. There is no separate callback method per se for attributes. When the startElement() method is invoked, the list of attributes for the element is passed to the method as an Attributes object.
The attributes can be accessed in three different ways:
By the attribute index
By the namespace-qualified name
By the qualified (prefixed) name
In our application, the attributes are accessed by the attribute index and by the qualified (prefixed) name. The application logic is as follows: When the startElement() method is called, the application will look for an attribute called price. If the attribute named price is found, the value of the attribute is printed and the remaining attributes are ignored. If the price attribute is not found, all the attributes and their values are printed. The printing of all attributes is handled through a function named printAllAttributes(). This method takes the Attributes object as its only parameter.
First, the startElement() method is to be updated according to this logic. To do so, add the lines displayed in bold in Listing 3.8.
Listing 3.8 Accessing Element Attributes
public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes elementAttributes) { System.out.println("Start Element-> "+qualifiedName); /* Get the index of price attributes*/ int indexOfPrice=elementAttributes.getIndex("price"); /* If a price attribute does not exist then all the attributes are printed out*/ /* Note that in the sample the attribute name is case sensitive so all attributes of carstereo get printed out*/ if(-1==indexOfPrice) printAllAttributes(elementAttributes); else System.out.println("\tPrice= "+ elementAttributes.getValue("price")); }
Next, add the code for the printAllAttributes() method. To do so, add the lines displayed in bold.
public void printAllAttributes(Attributes elementAttributes) { System.out.println("\tTotal Number of Attributes: "+ elementAttributes.getLength()); for(int i=0;i<elementAttributes.getLength();i++) { System.out.println("\t\tAttribute: "+ elementAttributes.getQName(i)+ " = "+elementAttributes.getValue(i)); }
The application is all set to handle the element attributes.
NOTE
The code discussed here is available in the example03 folder. This folder also contains the sample CarParts.xml file.
Compile and execute the program. The output should be similar to Listing 3.9.
Listing 3.9 Output of MyXMLHandler with Attributes
Start Document: -----Reading CarParts.xml with MyXMLHandler------ Start Element-> carparts Total Number of Attributes: 0 Start Element-> engine Price= 3500 End Element-> engine Start Element-> carbody Total Number of Attributes: 2 Attribute: type = Tallboy Attribute: color = blue End Element-> carbody Start Element-> wheel Price= 120 End Element-> wheel Start Element-> carstereo Total Number of Attributes: 3 Attribute: manufacturer = MagicSound Attribute: model = T76w Attribute: Price = 500 End Element-> carstereo End Element-> carparts End Document: ----------------Finished Reading the document---------------------
Next, the application will be enhanced to handle the processing instructions.
Handling Processing Instructions
Processing instructions are declarations in which you can provide specific instructions for specific applications.
The format for a processing instruction is <?target data?>, where target is the application that is expected to do the processing, and data is the information for the application to process.
Add the lines displayed in bold in Listing 3.10 to add a processing instruction to the CarParts.xml file. Note that in addition to the processing instruction, a new element called Supplier is also being added.
Listing 3.10 Adding a Processing Instruction to CarParts.xml
<carparts> <?supplierformat format="X13" version="3.2"?> <supplier name="Car Parts Heaven" URL="http://carpartsheaven.com"> </supplier> <engine id="E129" type="Alpha37" capacity="2500" price="3500"> </engine> <carbody id="C32" type="Tallboy" color="blue"> </carbody> <wheel id="W88" type="X3527" price="120"> </wheel> <carstereo id="C2" manufacturer="MagicSound" model="T76w" Price="500"> </carstereo> </carparts>
Here the target application is called supplierformat, and the data that the supplierformat application has to process is format="X13" version="3.2".
The processing instructions are handled through the processingInstruction() callback method defined in the ContentHandler interface. In MyXMLHandler, override the processingInstruction() method to display the name of the target application and the data. Both of these are passed as parameters to the processingInstruction() method.
To enhance the application to handle processing instructions, add the lines displayed in bold in Listing 3.11.
Listing 3.11 Implementing the processingInstruction()Method
public void endElement(String namespaceURI, String localName, String qualifiedName) { System.out.println("End Element-> "+qualifiedName); } public void processingInstruction(java.lang.String target, java.lang.String data) { System.out.println("Processing Instruction-> on target: "+ target+ "\n\t\t\t and data: "+data); } public void endDocument() { System.out.println("End Document: \n---------------- Finished Reading the document---------------\n"); }
This overrides the processingInstruction() method to display the name of the target application and the data that it will process.
NOTE
The code discussed here is available in the example04 folder. This folder also contains the sample CarParts.xml file.
Compile and run the application. The output should be similar to Listing 3.12.
Listing 3.12 Output of MyXMLHandler with Processing Instructions
Start Document: -----Reading the document CarParts.xml with MyXMLHandler------ Start Element-> carparts Total Number of Attributes: 0 Processing Instruction-> on target: supplierformat and data: format="X13" version="3.2" Start Element-> supplier Total Number of Attributes: 2 Attribute: name = Car Parts Heaven Attribute: URL = http://carpartsheaven.com End Element-> supplier Start Element-> engine Price= 3500 End Element-> engine Start Element-> carbody Total Number of Attributes: 3 Attribute: id = C32 Attribute: type = Tallboy Attribute: color = blue End Element-> carbody Start Element-> wheel Price= 120 End Element-> wheel Start Element-> carstereo Total Number of Attributes: 4 Attribute: id = C2 Attribute: manufacturer = MagicSound Attribute: model = T76w Attribute: Price = 500 End Element-> carstereo End Element-> carparts End Document: ----------------Finished Reading the document---------------------
The application at this point is capable of processing the elements and their attributes. It can also handle processing instructions and exceptions thrown by the SAX parser.
Next, the application will be updated to handle the content that is present between the tags.
Handling Characters
We will make the following two changes to the CarParts.xml:
The CarParts.xml will be enhanced to handle multiple entries of engines, wheels, car bodies, and stereos.
Add content between the tags.
To make the changes in the CarParts.xml file, add the lines displayed in bold in Listing 3.13.
Listing 3.13 Adding Data to Tags
<?xml version='1.0'?> <!-- XML file that describes car parts --> <carparts> <?supplierformat format="X13" version="3.2"?> <supplier name="Car Parts Heaven" URL="http://carpartsheaven.com"> </supplier> <engines> <engine id="E129" type="Alpha37" capacity="2500" price="3500"> Engine 1 </engine> </engines> <carbodies> <carbody id="C32" type="Tallboy" color="blue"> Car Body 1 </carbody> </carbodies> <wheels> <wheel id="W88" type="X3527" price="120"> Wheel Set 1 </wheel> </wheels> <carstereos> <carstereo id="C2" manufacturer="MagicSound" model="T76w" Price="500"> Car Stereo 1 </carstereo> </carstereos> </carparts>
Here you've introduced four new elements (engines, carbodies, wheels, and carstereos) to store multiple entries of engines, car bodies, wheels, and car stereos. You've also added the content between the engine, carbody, wheel, and carstereo tags. These are required to demonstrate a specific behavior of the characters() callback method.
Next, the application needs to be updated to handle the content between the tags. The content between the tags is handled by overriding the characters() callback method of the ContentHandler interface.
To override the characters() method to display the content between the tags, add the lines listed in bold in Listing 3.14.
Listing 3.14 Implementing the characters() Method
public void endElement(String namespaceURI, String localName, String qualifiedName) { System.out.println("End Element-> "+qualifiedName); } public void characters(char[] ch, int start, int length) { System.out.println("Characters: " + new String(ch,start,length)); } public void endDocument() { System.out.println("\n End Document: ----------------Finished Reading the document---------------------\n"); }
This overrides the characters() method to display the text between the tags.
NOTE
The code discussed here is available in the exampl02A01 folder. This folder also contains the sample CarParts.xml file.
Compile and run the program. The output should be similar to Listing 3.15.
Listing 3.15 Output of MyXMLHandler with characters() Method
Start Document: -----Reading the document CarAPrts.xml with MyXMLHandler------ Start Element-> carparts Total Number of Attributes: 0 Characters: Characters: Start Element-> supplier Total Number of Attributes: 2 Attribute: name = Car Parts Heaven Attribute: URL = http://carpartsheaven.com Characters: End Element-> supplier Characters: Start Element-> engines Total Number of Attributes: 0 Characters: Start Element-> engine Price= 3500 Characters: Engine 1 Characters: End Element-> engine Characters: End Element-> engines Characters: Start Element-> carbodies Total Number of Attributes: 0 Characters: Start Element-> carbody Total Number of Attributes: 3 Attribute: id = C32 Attribute: type = Tallboy Attribute: color = blue Characters: Car Body 1 Characters: End Element-> carbody Characters: End Element-> carbodies Characters: Start Element-> wheels Total Number of Attributes: 0 Characters: Start Element-> wheel Price= 120 Characters: Wheel Set 1 Characters: End Element-> wheel Characters: End Element-> wheels Characters: Start Element-> carstereos Total Number of Attributes: 0 Characters: Start Element-> carstereo Total Number of Attributes: 4 Attribute: id = C2 Attribute: manufacturer = MagicSound Attribute: model = T76w Attribute: Price = 500 Characters: Car Stereo 1 Characters: End Element-> carstereo Characters: End Element-> carstereos Characters: End Element-> carparts End Document: ----------------Finished Reading the document---------------------
Notice that in the output, the characters() method is called between two elements, even if the element contains child elements and no text. For example, the characters() method is called between the carbodies and carbody elements, even though the carbodies element does not contain text and only contains the carbody child element. This happens because, by default, the parser assumes that any element that it sees contains text. As you will see later, using a DTD can ensure that the parser is able to distinguish which elements contain text and which elements contain child elements.
Handling a DTD and Entities
So far, what we have seen is an XML document without a Document Type Definition (DTD). A DTD describes the structure of the content of an XML document. It defines the elements and their order and relationship with each other. A DTD also defines the element attributes and whether the element and/or attributes are mandatory or optional. You can use a DTD to ensure that the XML document is well-formed and valid. See Appendix B, "XML: A Quick Tour," to learn more about DTDs.
Next we will create a DTD for our CarParts.xml file and update our application to handle the DTD. To add the DTD to the CarParts.xml file, add the lines displayed in bold in Listing 3.16.
Note that in the DTD, we defined two entities (companyname and companyweb) and used them in the name and URL attributes of the supplier element. Entities are analogous to a macro, and they're a good way to represent information that appears multiple times in an XML document.
Listing 3.16 The DTD for CarParts.xml
<?xml version='1.0'?> <!-- XML file that describes car parts --> <!DOCTYPE carparts[ <!ENTITY companyname "Heaven Car Parts (TM)"> <!ENTITY companyweb "http://carpartsheaven.com"> <!ELEMENT carparts (supplier,engines,carbodies,wheels,carstereos)> <!ELEMENT engines (engine+)> <!ELEMENT carbodies (carbody+)> <!ELEMENT wheels (wheel+)> <!ELEMENT carstereos (carstereo+)> <!ELEMENT supplier EMPTY> <!ATTLIST supplier name CDATA #REQUIRED URL CDATA #REQUIRED > <!ELEMENT engine (#PCDATA)*> <!ATTLIST engine id CDATA #REQUIRED type CDATA #REQUIRED capacity (1000 | 2000 | 2500 ) #REQUIRED price CDATA #IMPLIED text CDATA #IMPLIED > <!ELEMENT carbody (#PCDATA)*> <!ATTLIST carbody id CDATA #REQUIRED type CDATA #REQUIRED color CDATA #REQUIRED > <!ELEMENT wheel (#PCDATA)*> <!ATTLIST wheel id CDATA #REQUIRED type CDATA #REQUIRED price CDATA #IMPLIED size (X | Y | Z) #IMPLIED > <!ELEMENT carstereo (#PCDATA)*> <!ATTLIST carstereo id CDATA #REQUIRED manufacturer CDATA #REQUIRED model CDATA #REQUIRED Price CDATA #REQUIRED > ]> <carparts> <?supplierformat format="X13" version="3.2"?> <supplier name="&companyname;" URL="&companyweb"> </supplier> <engines> <engine id="E129" type="Alpha37" capacity="2500" price="3500"> Engine 1 </engine> </engines> <carbodies> <carbody id="C32" type="Tallboy" color="blue"> Car Body 1 </carbody> </carbodies> <wheels> <wheel id="W88" type="X3527" price="120"> Wheel Set 1 </wheel> </wheels> <carstereos> <carstereo id="C2" manufacturer="MagicSound" model="T76w" Price="500"> Car Stereo 1 </carstereo> </carstereos> </carparts>
Run the application with the updated CarParts.xml file.
NOTE
The code discussed here is available in the exampl02A02 folder. This folder also contains the sample CarParts.xml file.
The output should be similar to Listing 3.17.
Listing 3.17 Output of MyXMLHandler with DTD
Start Document: -----Reading the document CarParts.xml with MyXMLHandler------ Start Element-> carparts Total Number of Attributes: 0 Start Element-> supplier Total Number of Attributes: 2 Attribute: name = Heaven Car Parts (TM) Attribute: URL = http://carpartsheaven.com Characters: End Element-> supplier Start Element-> engines Total Number of Attributes: 0 Start Element-> engine Price= 3500 Characters: Engine 1 Characters: End Element-> engine End Element-> engines Start Element-> carbodies Total Number of Attributes: 0 Start Element-> carbody Total Number of Attributes: 3 Attribute: id = C32 Attribute: type = Tallboy Attribute: color = blue Characters: Car Body 1 Characters: End Element-> carbody End Element-> carbodies Start Element-> wheels Total Number of Attributes: 0 Start Element-> wheel Price= 120 Characters: Wheel Set 1 Characters: End Element-> wheel End Element-> wheels Start Element-> carstereos Total Number of Attributes: 0 Start Element-> carstereo Total Number of Attributes: 4 Attribute: id = C2 Attribute: manufacturer = MagicSound Attribute: model = T76w Attribute: Price = 500 Characters: Car Stereo 1 Characters: End Element-> carstereo End Element-> carstereos End Element-> carparts End Document: ----------------Finished Reading the document---------------------
Notice that with the DTD in place, the characters() method is now called only for those elements that have text following them. It is no longer called for the elements that have only child elements. However, if you want to get the whitespace between the elements that have child elements, you will need to override the ignorableWhitespace() method of the ContentHandler interface.
One of the possible scenarios in which you may override the ignorbaleWhitespace() method is when your application has to process an XML file to generate another human-readable XML file. Overriding the ignorableWhitespace() method will help retain the formatting of the XML document.
Also, the companyname and the companyweb entities have been resolved and replaced with their values in the output.
So far, we have been using a nonvalidating parser to parse the XML document. A nonvalidating parser can check whether a document is well-formed. However, it cannot determine whether the document is valid with respect to the DTD. For this, a validating parser is required.
Using a Validating Parser
To generate a validating parser, add the lines listed in bold:
static public void main(String[] args) throws Exception { ...... /*Create a SAX Parser Factory*/ SAXParserFactory parseFactory = SAXParserFactory.newInstance(); /*Set to generate Validating SAX Parser */ parseFactory.setValidating(true); /*Obtain a validating SAX Parser */ SAXParser saxParser = parseFactory.newSAXParser(); ...... }
The setValidating() method sets the parser to validate the XML document as it is being parsed. All other instances of the parser created from this instance of the factory will also be validating.
NOTE
The code discussed here is available in the exampl02A03 folder. This folder also contains the sample CarParts.xml file.
Compile and run the program. The output should be similar to Listing 3.18.
Listing 3.18 Output of MyXMLHandler with Parser in Validating Mode
Version 2A03.0 of MyXMLHandler in example02A03 Start Document: -----Reading the document CarParts.xml with MyXMLHandler------ Start Element-> carparts Total Number of Attributes: 0 Start Element-> supplier Total Number of Attributes: 2 Attribute: name = Heaven Car Parts (TM) Attribute: URL = http://carpartsheaven.com Characters: Error occurred org.xml.sax.SAXParseException: The content of element type "supplier" must match "EMPTY". End Element-> supplier Start Element-> engines Total Number of Attributes: 0 Start Element-> engine Price= 3500 Characters: Engine 1 Characters: End Element-> engine End Element-> engines Start Element-> carbodies Total Number of Attributes: 0 Start Element-> carbody Total Number of Attributes: 3 Attribute: id = C32 Attribute: type = Tallboy Attribute: color = blue Characters: Car Body 1 Characters: End Element-> carbody End Element-> carbodies Start Element-> wheels Total Number of Attributes: 0 Start Element-> wheel Price= 120 Characters: Wheel Set 1 Characters: End Element-> wheel End Element-> wheels Start Element-> carstereos Total Number of Attributes: 0 Error occurred org.xml.sax.SAXParseException: Attribute "price" is required and must be specified for element type "carstereo". Error occurred org.xml.sax.SAXParseException: Attribute "Price" must be declared for element type "carstereo". Start Element-> carstereo Total Number of Attributes: 4 Attribute: id = C2 Attribute: manufacturer = MagicSound Attribute: model = T76w Attribute: Price = 500 Characters: Car Stereo 1 Characters: End Element-> carstereo End Element-> carstereos End Element-> carparts End Document: ----------------Finished Reading the document---------------------
Notice that the parser generates two errors related to the price attribute of the carstereo element. Going back to the CarParts.xml file, you will notice that while the DTD defines the attribute as price, the actual XML document itself has it listed as Price.
Therefore, by using a combination of DTD and a validating parser, you can ensure that your XML document is both well-formed and valid.