- Sams Teach Yourself XML in 21 Days, Third Edition
- Table of Contents
- About the Author
- Acknowledgments
- We Want to Hear from You!
- Introduction
- Part I: At a Glance
- Day 1. Welcome to XML
- All About Markup Languages
- All About XML
- Looking at XML in a Browser
- Working with XML Data Yourself
- Structuring Your Data
- Creating Well-Formed XML Documents
- Creating Valid XML Documents
- How XML Is Used in the Real World
- Online XML Resources
- Summary
- Q&A
- Workshop
- Day 2. Creating XML Documents
- Choosing an XML Editor
- Using XML Browsers
- Using XML Validators
- Creating XML Documents Piece by Piece
- Creating Prologs
- Creating an XML Declaration
- Creating XML Comments
- Creating Processing Instructions
- Creating Tags and Elements
- Creating CDATA Sections
- Handling Entities
- Summary
- Q&A
- Workshop
- Day 3. Creating Well-Formed XML Documents
- What Makes an XML Document Well-Formed?
- Creating an Example XML Document
- Understanding the Well-Formedness Constraints
- Using XML Namespaces
- Understanding XML Infosets
- Understanding Canonical XML
- Summary
- Q&A
- Workshop
- Day 4. Creating Valid XML Documents: DTDs
- All About DTDs
- Validating a Document by Using a DTD
- Creating Element Content Models
- Commenting a DTD
- Supporting External DTDs
- Handling Namespaces in DTDs
- Summary
- Q&A
- Workshop
- Declaring Attributes in DTDs
- Day 5. Handling Attributes and Entities in DTDs
- Specifying Default Values
- Specifying Attribute Types
- Handling Entities
- Summary
- Q&A
- Workshop
- Day 6. Creating Valid XML Documents: XML Schemas
- Using XML Schema Tools
- Creating XML Schemas
- Dissecting an XML Schema
- The Built-in XML Schema Elements
- Creating Elements and Types
- Specifying a Number of Elements
- Specifying Element Default Values
- Creating Attributes
- Summary
- Q&A
- Workshop
- Day 7. Creating Types in XML Schemas
- Restricting Simple Types by Using XML Schema Facets
- Creating XML Schema Choices
- Using Anonymous Type Definitions
- Declaring Empty Elements
- Declaring Mixed-Content Elements
- Grouping Elements Together
- Grouping Attributes Together
- Declaring all Groups
- Handling Namespaces in Schemas
- Annotating an XML Schema
- Summary
- Q&A
- Workshop
- Part I. In Review
- Well-Formed Documents
- Valid Documents
- Part II: At a Glance
- Day 8. Formatting XML by Using Cascading Style Sheets
- Our Sample XML Document
- Introducing CSS
- Connecting CSS Style Sheets and XML Documents
- Creating Style Sheet Selectors
- Using Inline Styles
- Creating Style Rule Specifications in Style Sheets
- Summary
- Q&A
- Workshop
- Day 9. Formatting XML by Using XSLT
- Introducing XSLT
- Transforming XML by Using XSLT
- Writing XSLT Style Sheets
- Using <xsl:apply-templates>
- Using <xsl:value-of> and <xsl:for-each>
- Matching Nodes by Using the match Attribute
- Working with the select Attribute and XPath
- Using <xsl:copy>
- Using <xsl:if>
- Using <xsl:choose>
- Specifying the Output Document Type
- Summary
- Q&A
- Workshop
- Day 10. Working with XSL Formatting Objects
- Introducing XSL-FO
- Using XSL-FO
- Using XSL Formatting Objects and Properties
- Building an XSL-FO Document
- Handling Inline Formatting
- Formatting Lists
- Formatting Tables
- Summary
- Q&A
- Workshop
- Part II. In Review
- Using CSS
- Using XSLT
- Using XSL-FO
- Part III: At a Glance
- Day 11. Extending HTML with XHTML
- Why XHTML?
- Writing XHTML Documents
- Validating XHTML Documents
- The Basic XHTML Elements
- Organizing Text
- Formatting Text
- Selecting Fonts: <font>
- Comments: <!-->
- Summary
- Q&A
- Workshop
- Day 12. Putting XHTML to Work
- Creating Hyperlinks: <a>
- Linking to Other Documents: <link>
- Handling Images: <img>
- Creating Frame Documents: <frameset>
- Creating Frames: <frame>
- Creating Embedded Style Sheets: <style>
- Formatting Tables: <table>
- Creating Table Rows: <tr>
- Formatting Table Headers: <th>
- Formatting Table Data: <td>
- Extending XHTML
- Summary
- Q&A
- Workshop
- Day 13. Creating Graphics and Multimedia: SVG and SMIL
- Introducing SVG
- Creating an SVG Document
- Creating Rectangles
- Adobe's SVG Viewer
- Using CSS Styles
- Creating Circles
- Creating Ellipses
- Creating Lines
- Creating Polylines
- Creating Polygons
- Creating Text
- Creating Gradients
- Creating Paths
- Creating Text Paths
- Creating Groups and Transformations
- Creating Animation
- Creating Links
- Creating Scripts
- Embedding SVG in HTML
- Introducing SMIL
- Summary
- Q&A
- Workshop
- Day 14. Handling XLinks, XPointers, and XForms
- Introducing XLinks
- Beyond Simple XLinks
- Introducing XPointers
- Introducing XBase
- Introducing XForms
- Summary
- Workshop
- Part III. In Review
- Part IV: At a Glance
- Day 15. Using JavaScript and XML
- Introducing the W3C DOM
- Introducing the DOM Objects
- Working with the XML DOM in JavaScript
- Searching for Elements by Name
- Reading Attribute Values
- Getting All XML Data from a Document
- Validating XML Documents by Using DTDs
- Summary
- Q&A
- Workshop
- Day 16. Using Java and .NET: DOM
- Using Java to Read XML Data
- Finding Elements by Name
- Creating an XML Browser by Using Java
- Navigating Through XML Documents
- Writing XML by Using Java
- Summary
- Q&A
- Workshop
- Day 17. Using Java and .NET: SAX
- An Overview of SAX
- Using SAX
- Using SAX to Find Elements by Name
- Creating an XML Browser by Using Java and SAX
- Navigating Through XML Documents by Using SAX
- Writing XML by Using Java and SAX
- Summary
- Q&A
- Workshop
- Day 18. Working with SOAP and RDF
- Introducing SOAP
- A SOAP Example in .NET
- A SOAP Example in Java
- Introducing RDF
- Summary
- Q&A
- Workshop
- Part IV. In Review
- Part V: At a Glance
- Day 19. Handling XML Data Binding
- Introducing DSOs
- Binding HTML Elements to HTML Data
- Binding HTML Elements to XML Data
- Binding HTML Tables to XML Data
- Accessing Individual Data Fields
- Binding HTML Elements to XML Data by Using the XML DSO
- Binding HTML Tables to XML Data by Using the XML DSO
- Searching XML Data by Using a DSO and JavaScript
- Handling Hierarchical XML Data
- Summary
- Q&A
- Workshop
- Day 20. Working with XML and Databases
- XML, Databases, and ASP
- Storing Databases as XML
- Using XPath with a Database
- Introducing XQuery
- Summary
- Q&A
- Workshop
- Day 21. Handling XML in .NET
- Creating and Editing an XML Document in .NET
- From XML to Databases and Back
- Reading and Writing XML in .NET Code
- Using XML Controls to Display Formatted XML
- Creating XML Web Services
- Summary
- Q&A
- Workshop
- Part V. In Review
- Appendix A. Quiz Answers
- Quiz Answers for Day 1
- Quiz Answers for Day 2
- Quiz Answers for Day 3
- Quiz Answers for Day 4
- Quiz Answers for Day 5
- Quiz Answers for Day 6
- Quiz Answers for Day 7
- Quiz Answers for Day 8
- Quiz Answers for Day 9
- Quiz Answers for Day 10
- Quiz Answers for Day 11
- Quiz Answers for Day 12
- Quiz Answers for Day 13
- Quiz Answers for Day 14
- Quiz Answers for Day 15
- Quiz Answers for Day 16
- Quiz Answers for Day 17
- Quiz Answers for Day 18
- Quiz Answers for Day 19
- Quiz Answers for Day 20
- Quiz Answers for Day 21
Using SAX
Today's first example shows how to work with SAX. You'll start with the same example you used yesterday, except today, you'll use SAX. In particular, you're going to extract all the data from ch17_01.xml, which is shown in Listing 17.1.
Example 17.1. A Sample XML Document for Use with SAX Methods (ch17_01.xml)
<?xml version="1.0" encoding="UTF-8"?> <session> <committee type="monetary"> <title>Finance</title> <number>17</number> <subject>Donut Costs</subject> <date>7/15/2005</date> <attendees> <senator status="present"> <firstName>Thomas</firstName> <lastName>Smith</lastName> </senator> <senator status="absent"> <firstName>Frank</firstName> <lastName>McCoy</lastName> </senator> <senator status="present"> <firstName>Jay</firstName> <lastName>Jones</lastName> </senator> </attendees> </committee> </session>
How do you handle this XML document by using SAX? You start in the main method by calling a new version of the childLoop method created yesterday. This method will fill the same array of strings, displayText, and you'll store the number of strings in a variable named numberLines. When the childLoop method is done, all you have to do is display all the text in the displayText array:
import java.io.*; import org.xml.sax.*; import javax.xml.parsers.*; import org.xml.sax.helpers.DefaultHandler; public class ch17_02 extends DefaultHandler { static int numberLines = 0; static String indentation = ""; static String displayText[] = new String[1000]; public static void main(String args[]) { ch17_02 parser = new ch17_02(); parser.childLoop(args[0]); for(int loopIndex = 0; loopIndex < numberLines; loopIndex++){ System.out.println(displayText[loopIndex]); } }
In the childLoop method, start by creating a SAXParserFactory object, using a DefaultHandler object. The DefaultHandler object tells SAX which object to call when it encounters various nodes, and you'll use the present application object, which you've based on the DefaultHandler class:
public class ch17_02 extends DefaultHandler { . . .
To refer to the current object, use the Java this keyword. Here's how to create the SAXParserFactory object:
public void childLoop(String uri) { DefaultHandler saxHandler = this; SAXParserFactory saxFactory = SAXParserFactory.newInstance(); . . . }
Now create a new SAX parser by using this SAXParserFactory object, and this SAX parser will parse the XML document ch17_01.xml:
public void childLoop(String uri) { DefaultHandler saxHandler = this; SAXParserFactory saxFactory = SAXParserFactory.newInstance(); try { SAXParser saxParser = saxFactory.newSAXParser(); saxParser.parse(new File(uri), saxHandler); } catch (Throwable t) {} }
Table 17.1 lists the significant methods of the SAXParserFactory class, and Table 17.2 lists the significant methods of the SAXParser class.
Table 17.1. Methods of the javax.xml.parsers.SAXParserFactory Interface
Method |
What It Does |
protected SAXParserFactory() |
Acts as the default constructor for the class. |
boolean isNamespaceAware() |
Returns True if the factory is configured to produce parsers that use XML namespaces. |
boolean isValidating() |
Returns True if the factory is configured to produce parsers that validate the XML content. |
static SAXParserFactory newInstance() |
Returns a new SAXParserFactory object. |
abstract SAXParser newSAXParser() |
Returns a new SAXParser object. |
void setNamespaceAware(boolean awareness) |
Requires that the created parser support XML namespaces. |
void setValidating(boolean validating) |
Requires that the parser produced validate XML documents. |
Table 17.2. Methods of the SAXParser Class
Method |
What It Does |
protected SAXParser() |
Acts as the default class constructor. |
abstract Parser getParser() |
Returns the SAX parser. |
abstract boolean isNamespaceAware() |
Returns True if this parser can understand namespaces. |
abstract boolean isValidating() |
Returns True if this parser is configured to validate XML documents. |
void parse(File f, DefaultHandler dh) |
Parses the file specified. |
void parse(InputSource is, DefaultHandler dh) |
Parses the content specified InputSource object. |
void parse(InputStream is, DefaultHandler dh) |
Parses the content of the specified InputStream object. |
void parse(String uri, DefaultHandler dh) |
Parses the content at the given URI, using the specified DefaultHandler object. |
abstract void setProperty(String name, Object value) |
Sets a property in the parser. |
Now you've connected our SAX parser to our program and launched it, which means it will be calling various methods in your code to handle various types of nodes. It does this because you've based the program's main class on the SAX DefaultHandler class:
import java.io.*; import org.xml.sax.*; import javax.xml.parsers.*; import org.xml.sax.helpers.DefaultHandler; public class ch17_02 extends DefaultHandler . . .
The DefaultHandler class has a number of predefined methods, called callback methods, that the SAX parser will call:
- characters — Called by the SAX parser for text nodes.
- endDocument — Called by the SAX parser when the end of the document is seen.
- endElement — Called by the SAX parser when the closing tag of an element is seen.
- startDocument — Called by the SAX parser when the start of the document is seen.
- startElement — Called by the SAX parser when the opening tag of an element is seen.
All the required callback methods are already implemented in the DefaultHandler class, but they don't do anything. That means we only have to implement the methods we want to use, such as startDocument to catch the beginning of the document or endDocument to catch the end of the document, as described later today. Table 7.3 lists the significant methods of the DefaultHandler class.
Table 17.3. Methods of the DefaultHandler Class
Method |
What It Does |
DefaultHandler() |
Acts as the default class constructor. |
void characters(char[] ch, int start, int length) |
Handles text nodes. |
void endDocument() |
Handles the end of the document. |
void endElement(String uri, String localName, String qName) |
Handles the end of an element. |
void error(SAXParseException e) |
Handles a recoverable parser error. |
void fatalError(SAXParseException e) |
Reports a fatal parsing error. |
void ignorableWhitespace(char[] ch, int start, int length) |
Handles ignorable whitespace (such as that used to indent a document) in element content. |
void notationDecl(String name, String publicId, String systemId) |
Handles a notation declaration. |
void processingInstruction(String target, String data) |
Handles an XML processing instruction (such as a JSP directive). |
InputSource resolveEntity(String publicId, String systemId) |
Resolves an external entity. |
void setDocumentLocator(Locator locator) |
Sets a Locator object for document events. |
void skippedEntity(String name) |
Handles a skipped XML entity. |
void startDocument() |
Handles the beginning of the document. |
void startElement(String uri, String localName, String qName, Attributes attributes) |
Handles the start of an element. |
void startPrefixMapping(String prefix, String uri) |
Handles the start of a namespace mapping. |
void unparsedEntityDecl(String name, String publicId, String systemId, String notationName) |
Handles an unparsed entity declaration. |
void warning(SAXParseException e) |
Handles a parser warning. |
Let's start by handling the start of the document.
Handling the Start of a Document
To handle the start of a document, you can implement the DefaultHandler startDocument method:
public void startDocument() { . . . }
When this method is called, the SAX processor has already seen the beginning of the document, so just put a generic XML declaration into the displayText array:
public void startDocument() { displayText[numberLines] = indentation; displayText[numberLines] += "<?xml version=\"1.0\" encoding=\""+ "UTF-8" + "\"?>"; numberLines++; }
Handling Processing Instructions
We can handle processing instructions by using the DefaultHandler processingInstruction method, which is called automatically when the SAX parser finds a processing instruction. The target of the processing instruction is passed to us, as is the data for the processing instruction, which means you can handle processing instructions like this:
public void processingInstruction(String target, String data) { displayText[numberLines] = indentation; displayText[numberLines] += "<?"; displayText[numberLines] += target; if (data != null && data.length() > 0) { displayText[numberLines] += ' '; displayText[numberLines] += data; } displayText[numberLines] += "?>"; numberLines++; }
Handling the Start of an Element
To handle the start of an element, use the startElement SAX method. This method is passed the namespace URI of the element, the local (unqualified) name of the element, the qualified name of the element, and the element's attributes (as an Attributes object):
public void startElement(String uri, String localName, String qualifiedName, Attributes attributes) { . . . }
Store the element's name in our displayText array, like this:
public void startElement(String uri, String localName, String qualifiedName, Attributes attributes) { displayText[numberLines] = indentation; indentation += " "; displayText[numberLines] += '<'; displayText[numberLines] += qualifiedName; . . . displayText[numberLines] += '>'; numberLines++; }
So far, so good. But what if the element has attributes?
Handling Attributes
If the element has attributes, loop over them. And the way you determine whether the element has attributes is by checking whether the Attributes object passed to you in the startElement method is null:
public void startElement(String uri, String localName, String qualifiedName, Attributes attributes) { displayText[numberLines] = indentation; indentation += " "; displayText[numberLines] += '<'; displayText[numberLines] += qualifiedName; if (attributes != null) { . . . } displayText[numberLines] += '>'; numberLines++; }
Table 17.4 lists the methods of Attributes objects. We can reach the attributes in an object that implements this interface based on index, name, or namespace-qualified name.
Table 17.4. Attributes Interface Methods
Method |
What It Does |
int getIndex(java.lang.String uri, java.lang.String localPart) |
Returns the index of an attribute, by namespace and local name. |
int getIndex(java.lang.String qualifiedName) |
Returns the index of an attribute, given its qualified name. |
int getLength() |
Returns the number of attributes in the list. |
java.lang.String getLocalName(int index) |
Returns an attribute's local name, by index. |
java.lang.String getQName(int index) |
Returns an attribute's qualified name, by index. |
java.lang.String getType(int index) |
Returns an attribute's type, by index. |
java.lang.String getType-(java.lang.String qualifiedName) |
Returns an attribute's type, by qualified name. |
java.lang.String getType(java.lang.String uri, java.lang.String localName) |
Returns an attribute's type, by namespace and local name. |
java.lang.String getURI(int index) |
Returns an attribute's namespace URI, by index. |
java.lang.String getValue(int index) |
Returns an attribute's value, by index. |
java.lang.String getValue-(java.lang.String qualifiedName) |
Returns an attribute's value, by qualified name. |
java.lang.String getValue(java.lang.String uri, java.lang.String localName) |
Returns an attribute's value, by namespace name and local name. |
Now loop over the attributes and use the getQName (get qualified name) and getValue methods to store the name and value of each attribute:
public void startElement(String uri, String localName, String qualifiedName, Attributes attributes) { displayText[numberLines] = indentation; indentation += " "; displayText[numberLines] += '<'; displayText[numberLines] += qualifiedName; if (attributes != null) { int numberAttributes = attributes.getLength(); for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++) { displayText[numberLines] += ' '; displayText[numberLines] += attributes.getQName(loopIndex); displayText[numberLines] += "=\""; displayText[numberLines] += attributes.getValue(loopIndex); displayText[numberLines] += '"'; } } displayText[numberLines] += '>'; numberLines++; }
Next, you'll take a look at handling text.
Handling Text
In SAX, you handle text by using the characters method. This method is passed an array of characters, the location in that array where the text for the current text node starts, and the length of the text in the text node:
public void characters(char characters[], int start, int length) { . . . }
Here's how to handle the text of a text node, adding it to the displayText array:
public void characters(char characters[], int start, int length) { String characterData = (new String(characters, start, length)).trim(); if(characterData.indexOf("\n") < 0 && characterData.length() > 0) { displayText[numberLines] = indentation; displayText[numberLines] += characterData; numberLines++; } }
By default, the SAX parser will also call a method named ignorableWhitespace when it finds whitespace text nodes, such as whitespace used for indentation. If we want to handle that text like any other text, we can simply pass it on to the characters method we just implemented (note that we've commented this line out here because we're supplying our own indentation in this example):
public void ignorableWhitespace(char characters[], int start, int length) { //characters(characters, start, length); }
Handling the End of Elements
Besides the startElement method, which is called when the SAX parser sees the beginning of an element, we can also implement the endElement method to handle an element's closing tag. Here's how that looks in this example:
public void endElement(String uri, String localName, String qualifiedName) { indentation = indentation.substring(0, indentation.length() - 4); displayText[numberLines] = indentation; displayText[numberLines] += "</"; displayText[numberLines] += qualifiedName; displayText[numberLines] += '>'; numberLines++; }
Handling Errors and Warnings
SAX makes it easy to handle warnings and errors. We can implement the warning method to handle warnings, the error method to handle errors, and the fatalError method to handle errors that the SAX parser considers fatal enough to make it stop processing. Here's what the error handling looks like in this example:
public void warning(SAXParseException exception) { System.err.println("Warning: " + exception.getMessage()); } public void error(SAXParseException exception) { System.err.println("Error: " + exception.getMessage()); } public void fatalError(SAXParseException exception) { System.err.println("Fatal error: " + exception.getMessage()); }
And that's it—now run your new SAX code and parse ch17_01.xml like this:
%java ch17_02 ch17_01.xml
As shown in Figure 17.1, we've been able to read and extract all the data in the XML document by using SAX methods.
Figure 17.1 Parsing an XML document by using a SAX parser.
The complete Java code is in Listing 17.2.
Example 17.2. Parsing an XML Document by Using Java SAX (ch17_02.java)
import java.io.*; import org.xml.sax.*; import javax.xml.parsers.*; import org.xml.sax.helpers.DefaultHandler; public class ch17_02 extends DefaultHandler { static int numberLines = 0; static String indentation = ""; static String displayText[] = new String[1000]; public static void main(String args[]) { ch17_02 parser = new ch17_02(); parser.childLoop(args[0]); for(int loopIndex = 0; loopIndex < numberLines; loopIndex++){ System.out.println(displayText[loopIndex]); } } public void childLoop(String uri) { DefaultHandler saxHandler = this; SAXParserFactory saxFactory = SAXParserFactory.newInstance(); try { SAXParser saxParser = saxFactory.newSAXParser(); saxParser.parse(new File(uri), saxHandler); } catch (Throwable t) {} } public void startDocument() { displayText[numberLines] = indentation; displayText[numberLines] += "<?xml version=\"1.0\" encoding=\""+ "UTF-8" + "\"?>"; numberLines++; } public void processingInstruction(String target, String data) { displayText[numberLines] = indentation; displayText[numberLines] += "<?"; displayText[numberLines] += target; if (data != null && data.length() > 0) { displayText[numberLines] += ' '; displayText[numberLines] += data; } displayText[numberLines] += "?>"; numberLines++; } public void startElement(String uri, String localName, String qualifiedName, Attributes attributes) { displayText[numberLines] = indentation; indentation += " "; displayText[numberLines] += '<'; displayText[numberLines] += qualifiedName; if (attributes != null) { int numberAttributes = attributes.getLength(); for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++){ displayText[numberLines] += ' '; displayText[numberLines] += attributes.getQName(loopIndex); displayText[numberLines] += "=\""; displayText[numberLines] += attributes.getValue(loopIndex); displayText[numberLines] += '"'; } } displayText[numberLines] += '>'; numberLines++; } public void characters(char characters[], int start, int length) { String characterData = (new String(characters, start, length)).trim(); if(characterData.indexOf("\n") < 0 && characterData.length() > 0) { displayText[numberLines] = indentation; displayText[numberLines] += characterData; numberLines++; } } public void ignorableWhitespace(char characters[], int start, int length) { //characters(characters, start, length); } public void endElement(String uri, String localName, String qualifiedName) { indentation = indentation.substring(0, indentation.length() - 4); displayText[numberLines] = indentation; displayText[numberLines] += "</"; displayText[numberLines] += qualifiedName; displayText[numberLines] += '>'; numberLines++; } public void warning(SAXParseException exception) { System.err.println("Warning: " + exception.getMessage()); } public void error(SAXParseException exception) { System.err.println("Error: " + exception.getMessage()); } public void fatalError(SAXParseException exception) { System.err.println("Fatal error: " + exception.getMessage()); } }