Home > Articles > Web Services > XML

📄 Contents

  1. Sams Teach Yourself XML in 21 Days, Third Edition
  2. Table of Contents
  3. About the Author
  4. Acknowledgments
  5. We Want to Hear from You!
  6. Introduction
  7. Part I: At a Glance
  8. Day 1. Welcome to XML
  9. All About Markup Languages
  10. All About XML
  11. Looking at XML in a Browser
  12. Working with XML Data Yourself
  13. Structuring Your Data
  14. Creating Well-Formed XML Documents
  15. Creating Valid XML Documents
  16. How XML Is Used in the Real World
  17. Online XML Resources
  18. Summary
  19. Q&A
  20. Workshop
  21. Day 2. Creating XML Documents
  22. Choosing an XML Editor
  23. Using XML Browsers
  24. Using XML Validators
  25. Creating XML Documents Piece by Piece
  26. Creating Prologs
  27. Creating an XML Declaration
  28. Creating XML Comments
  29. Creating Processing Instructions
  30. Creating Tags and Elements
  31. Creating CDATA Sections
  32. Handling Entities
  33. Summary
  34. Q&A
  35. Workshop
  36. Day 3. Creating Well-Formed XML Documents
  37. What Makes an XML Document Well-Formed?
  38. Creating an Example XML Document
  39. Understanding the Well-Formedness Constraints
  40. Using XML Namespaces
  41. Understanding XML Infosets
  42. Understanding Canonical XML
  43. Summary
  44. Q&A
  45. Workshop
  46. Day 4. Creating Valid XML Documents: DTDs
  47. All About DTDs
  48. Validating a Document by Using a DTD
  49. Creating Element Content Models
  50. Commenting a DTD
  51. Supporting External DTDs
  52. Handling Namespaces in DTDs
  53. Summary
  54. Q&A
  55. Workshop
  56. Declaring Attributes in DTDs
  57. Day 5. Handling Attributes and Entities in DTDs
  58. Specifying Default Values
  59. Specifying Attribute Types
  60. Handling Entities
  61. Summary
  62. Q&A
  63. Workshop
  64. Day 6. Creating Valid XML Documents: XML Schemas
  65. Using XML Schema Tools
  66. Creating XML Schemas
  67. Dissecting an XML Schema
  68. The Built-in XML Schema Elements
  69. Creating Elements and Types
  70. Specifying a Number of Elements
  71. Specifying Element Default Values
  72. Creating Attributes
  73. Summary
  74. Q&A
  75. Workshop
  76. Day 7. Creating Types in XML Schemas
  77. Restricting Simple Types by Using XML Schema Facets
  78. Creating XML Schema Choices
  79. Using Anonymous Type Definitions
  80. Declaring Empty Elements
  81. Declaring Mixed-Content Elements
  82. Grouping Elements Together
  83. Grouping Attributes Together
  84. Declaring all Groups
  85. Handling Namespaces in Schemas
  86. Annotating an XML Schema
  87. Summary
  88. Q&A
  89. Workshop
  90. Part I. In Review
  91. Well-Formed Documents
  92. Valid Documents
  93. Part II: At a Glance
  94. Day 8. Formatting XML by Using Cascading Style Sheets
  95. Our Sample XML Document
  96. Introducing CSS
  97. Connecting CSS Style Sheets and XML Documents
  98. Creating Style Sheet Selectors
  99. Using Inline Styles
  100. Creating Style Rule Specifications in Style Sheets
  101. Summary
  102. Q&A
  103. Workshop
  104. Day 9. Formatting XML by Using XSLT
  105. Introducing XSLT
  106. Transforming XML by Using XSLT
  107. Writing XSLT Style Sheets
  108. Using <xsl:apply-templates>
  109. Using <xsl:value-of> and <xsl:for-each>
  110. Matching Nodes by Using the match Attribute
  111. Working with the select Attribute and XPath
  112. Using <xsl:copy>
  113. Using <xsl:if>
  114. Using <xsl:choose>
  115. Specifying the Output Document Type
  116. Summary
  117. Q&A
  118. Workshop
  119. Day 10. Working with XSL Formatting Objects
  120. Introducing XSL-FO
  121. Using XSL-FO
  122. Using XSL Formatting Objects and Properties
  123. Building an XSL-FO Document
  124. Handling Inline Formatting
  125. Formatting Lists
  126. Formatting Tables
  127. Summary
  128. Q&A
  129. Workshop
  130. Part II. In Review
  131. Using CSS
  132. Using XSLT
  133. Using XSL-FO
  134. Part III: At a Glance
  135. Day 11. Extending HTML with XHTML
  136. Why XHTML?
  137. Writing XHTML Documents
  138. Validating XHTML Documents
  139. The Basic XHTML Elements
  140. Organizing Text
  141. Formatting Text
  142. Selecting Fonts: <font>
  143. Comments: <!-->
  144. Summary
  145. Q&A
  146. Workshop
  147. Day 12. Putting XHTML to Work
  148. Creating Hyperlinks: <a>
  149. Linking to Other Documents: <link>
  150. Handling Images: <img>
  151. Creating Frame Documents: <frameset>
  152. Creating Frames: <frame>
  153. Creating Embedded Style Sheets: <style>
  154. Formatting Tables: <table>
  155. Creating Table Rows: <tr>
  156. Formatting Table Headers: <th>
  157. Formatting Table Data: <td>
  158. Extending XHTML
  159. Summary
  160. Q&A
  161. Workshop
  162. Day 13. Creating Graphics and Multimedia: SVG and SMIL
  163. Introducing SVG
  164. Creating an SVG Document
  165. Creating Rectangles
  166. Adobe's SVG Viewer
  167. Using CSS Styles
  168. Creating Circles
  169. Creating Ellipses
  170. Creating Lines
  171. Creating Polylines
  172. Creating Polygons
  173. Creating Text
  174. Creating Gradients
  175. Creating Paths
  176. Creating Text Paths
  177. Creating Groups and Transformations
  178. Creating Animation
  179. Creating Links
  180. Creating Scripts
  181. Embedding SVG in HTML
  182. Introducing SMIL
  183. Summary
  184. Q&A
  185. Workshop
  186. Day 14. Handling XLinks, XPointers, and XForms
  187. Introducing XLinks
  188. Beyond Simple XLinks
  189. Introducing XPointers
  190. Introducing XBase
  191. Introducing XForms
  192. Summary
  193. Workshop
  194. Part III. In Review
  195. Part IV: At a Glance
  196. Day 15. Using JavaScript and XML
  197. Introducing the W3C DOM
  198. Introducing the DOM Objects
  199. Working with the XML DOM in JavaScript
  200. Searching for Elements by Name
  201. Reading Attribute Values
  202. Getting All XML Data from a Document
  203. Validating XML Documents by Using DTDs
  204. Summary
  205. Q&A
  206. Workshop
  207. Day 16. Using Java and .NET: DOM
  208. Using Java to Read XML Data
  209. Finding Elements by Name
  210. Creating an XML Browser by Using Java
  211. Navigating Through XML Documents
  212. Writing XML by Using Java
  213. Summary
  214. Q&A
  215. Workshop
  216. Day 17. Using Java and .NET: SAX
  217. An Overview of SAX
  218. Using SAX
  219. Using SAX to Find Elements by Name
  220. Creating an XML Browser by Using Java and SAX
  221. Navigating Through XML Documents by Using SAX
  222. Writing XML by Using Java and SAX
  223. Summary
  224. Q&A
  225. Workshop
  226. Day 18. Working with SOAP and RDF
  227. Introducing SOAP
  228. A SOAP Example in .NET
  229. A SOAP Example in Java
  230. Introducing RDF
  231. Summary
  232. Q&A
  233. Workshop
  234. Part IV. In Review
  235. Part V: At a Glance
  236. Day 19. Handling XML Data Binding
  237. Introducing DSOs
  238. Binding HTML Elements to HTML Data
  239. Binding HTML Elements to XML Data
  240. Binding HTML Tables to XML Data
  241. Accessing Individual Data Fields
  242. Binding HTML Elements to XML Data by Using the XML DSO
  243. Binding HTML Tables to XML Data by Using the XML DSO
  244. Searching XML Data by Using a DSO and JavaScript
  245. Handling Hierarchical XML Data
  246. Summary
  247. Q&A
  248. Workshop
  249. Day 20. Working with XML and Databases
  250. XML, Databases, and ASP
  251. Storing Databases as XML
  252. Using XPath with a Database
  253. Introducing XQuery
  254. Summary
  255. Q&A
  256. Workshop
  257. Day 21. Handling XML in .NET
  258. Creating and Editing an XML Document in .NET
  259. From XML to Databases and Back
  260. Reading and Writing XML in .NET Code
  261. Using XML Controls to Display Formatted XML
  262. Creating XML Web Services
  263. Summary
  264. Q&A
  265. Workshop
  266. Part V. In Review
  267. Appendix A. Quiz Answers
  268. Quiz Answers for Day 1
  269. Quiz Answers for Day 2
  270. Quiz Answers for Day 3
  271. Quiz Answers for Day 4
  272. Quiz Answers for Day 5
  273. Quiz Answers for Day 6
  274. Quiz Answers for Day 7
  275. Quiz Answers for Day 8
  276. Quiz Answers for Day 9
  277. Quiz Answers for Day 10
  278. Quiz Answers for Day 11
  279. Quiz Answers for Day 12
  280. Quiz Answers for Day 13
  281. Quiz Answers for Day 14
  282. Quiz Answers for Day 15
  283. Quiz Answers for Day 16
  284. Quiz Answers for Day 17
  285. Quiz Answers for Day 18
  286. Quiz Answers for Day 19
  287. Quiz Answers for Day 20
  288. Quiz Answers for Day 21
Recommended Book

Using Java to Read XML Data

You're going to create a Java application to read in and extract all the data from the XML document you worked with yesterday, which you can see in Listing 16.1.

Example 16.1. A Sample XML Document for Today's Work (ch16_01.xml)

<?xml version="1.0" encoding="UTF-8"?>
<session>
   <committee type="monetary">
       <title>Finance</title>
       <number>17</number>
       <subject>Donut Costs</subject>
       <date>7/15/2005</date>
       <attendees>
           <senator status="present">
               <firstName>Thomas</firstName>
               <lastName>Smith</lastName>
           </senator>
           <senator status="absent">
               <firstName>Frank</firstName>
               <lastName>McCoy</lastName>
           </senator>
           <senator status="present">
               <firstName>Jay</firstName>
               <lastName>Jones</lastName>
           </senator>
       </attendees>
   </committee>
</session>

The Java application you'll create will show how to read in the entire XML document and display it, much as we did in JavaScript yesterday. You start by creating a DocumentBuilderFactory object, which you'll use to create a DocumentBuilder object, and that object will actually read in the XML document. Here's how to use DocumentBuilderFactory in the new application's main method, which is called when the application starts:

import javax.xml.parsers.*;
import org.w3c.dom.*;

public class ch16_02
{
    public static void main(String args[])
    {
        try {
            DocumentBuilderFactory factory =

                   DocumentBuilderFactory.newInstance();
        .
        .
        .
    }

Table 16.1 lists the methods for the DocumentBuilderFactory class.

Table 16.1. Methods of the javax.xml.parsers.DocumentBuilderFactory Class

Method

What It Does

protected DocumentBuilderFactory()

Acts as the default DocumentBuilderFactory constructor.

abstract Object getAttribute(String name)

Returns attribute values.

boolean isCoalescing()

Returns True if the factory is configured to produce parsers that convert CDATA nodes to text nodes.

boolean isExpandEntityReferences()

Returns True if the factory is configured to produce parsers that expand XML entity reference nodes.

boolean isIgnoringComments()

Returns True if the factory is configured to produce parsers that ignore comments.

boolean isIgnoringElementContentWhitespace()

Returns True if the factory is configured to produce parsers that ignore ignorable whitespace (such as that used to indent elements) in element content.

boolean isNamespaceAware()

Returns True if the factory is configured to produce parsers that can use XML namespaces.

boolean isValidating()

Returns True if the factory is configured to produce parsers that validate the XML content during parsing operations.

abstract DocumentBuilder newDocumentBuilder()

Returns a new DocumentBuilder object.

static DocumentBuilderFactory newInstance()

Returns a new DocumentBuilderFactory object.

abstract void setAttribute-(String name, Object value)

Sets attribute values.

void setCoalescing(boolean coalescing)

Specifies that the parser produced will convert CDATA nodes to text nodes.

void setExpandEntityReferences-(boolean expandEntityRef)

Specifies that the parser produced will expand XML entity reference nodes.

void setIgnoringComments-(boolean ignoreComments)

Specifies that the parser produced will ignore comments.

void setIgnoringElementContentWhitespace-(boolean whitespace)

Specifies that the parser produced must eliminate ignorable whitespace.

void setNamespaceAware(boolean awareness)

Specifies that the parser produced will provide support for XML namespaces.

void setValidating(boolean validating)

Specifies that the parser produced will validate documents as they are parsed.

To actually parse the XML document and extract data from it, we need a DocumentBuilder object, which is created by the DocumentBuilderFactory object. Here's what that looks like in code:

public class ch16_02
{
    public static void main(String args[])
    {
        try {
            DocumentBuilderFactory factory =
                DocumentBuilderFactory.newInstance();

            DocumentBuilder builder = null;

               try {

                   builder = factory.newDocumentBuilder();

               }

               catch (ParserConfigurationException e) {}
        .
        .
        .
    }

Table 16.2 lists the methods of the DocumentBuilder class.

Table 16.2. Methods of the javax.xml.parsers.DocumentBuilder Class

Method

What It Does

protected DocumentBuilder()

Acts as the default DocumentBuilder constructor.

abstract boolean isNamespaceAware()

Returns True if this parser is configured to understand namespaces.

abstract boolean isValidating()

Returns True if this parser is configured to validate XML documents.

abstract Document newDocument()

Returns a new instance of a DOM Document object to build a DOM tree with.

Document parse(File f)

Indicates to parse the content of the file as an XML document and return a new DOM Document object.

Document parse(InputStream is)

Indicates to parse the content of a given InputStream object as an XML document and return a new DOM Document object.

Document parse(InputStream is, String systemId)

Indicates to parse the content of an InputStream object as an XML document and return a new DOM Document object.

Document parse(String uri)

Indicates to parse the content of a URI as an XML document and return a new DOM Document object.

abstract void setErrorHandler(ErrorHandler eh)

Sets the ErrorHandler object to be used to report errors.

When the user starts the Java application ch16_02.class, he or she will type the name of the XML document to read, as we do in the following example, where we want to read and display the data in ch16_01.xml:

%java ch16_02 ch16_01.xml

You can access the name of the XML document the user wants to read as args[0]. Here's how to create a Java Document object that corresponds to the XML document, using the DocumentBuilder object:

import javax.xml.parsers.*;
import org.w3c.dom.*;

public class ch16_02
{
    public static void main(String args[])
    {
        try {
            DocumentBuilderFactory factory =
                DocumentBuilderFactory.newInstance();

            DocumentBuilder builder = null;
            try {
                builder = factory.newDocumentBuilder();
            }
            catch (ParserConfigurationException e) {}

            Document document = null;

               document = builder.parse(args[0]);
        .
        .
        .
    }

At this point, then, we have a Java Document object (actually an org.w3c.dom.Document object) that corresponds to the XML document, and we can use the various methods of that object to work with the XML document. Table 16.3 lists the methods of Document objects.

Table 16.3. Methods of the org.w3c.dom.Document Interface

Method

What It Does

Attr createAttribute(String name)

Returns a new attribute object.

Attr createAttributeNS-(String namespaceURI, String qualifiedName)

Returns a new attribute that has the given name and namespace.

CDATASection createCDATASection(String data)

Returns a new CDATASection node whose value is the given string.

Comment createComment(String data)

Returns a new Comment node created using the given string.

DocumentFragment createDocumentFragment()

Returns a new empty DocumentFragment object.

Element createElement(String tagName)

Returns a new element of the type given.

Element createElementNS-(String namespaceURI, String qualifiedName)

Returns a new element of the given qualified name and namespace URI.

ProcessingInstruction createProcessingInstruction-(String target, String data)

Returns a new ProcessingInstruction node.

Text createTextNode(String data)

Returns a new text node, given the specified string.

DocumentType getDoctype()

Returns the DTD for this document.

Element getDocumentElement()

Gives direct access to document element.

Element getElementById(String elementId)

Returns an element whose ID is given.

NodeList getElementsByTagName(String tagname)

Returns all elements with a given tag name.

NodeList getElementsByTagNameNS-(String namespaceURI, String localName)

Returns all elements with a given name and namespace.

Node importNode(Node importedNode, boolean deep)

Imports a node from another document.

The next step is to work through the XML document recursively, as you did with JavaScript yesterday. You'll do that in a method named childLoop that you can call recursively. Just as you did with JavaScript, you'll also pass an indentation string to this method, which will be increased for each successive generation of a node's children. This method will fill an array of strings, displayText, with the XML data from the document and store the total number of strings in the array in a variable named numberLines. When childLoop is done filling the array of strings, you'll display them, like this:

import javax.xml.parsers.*;
import org.w3c.dom.*;

public class ch16_02
{
    static String displayText[] = new String[1000];

       static int numberLines = 0;

    public static void main(String args[])
    {
        try {
            DocumentBuilderFactory factory =
                DocumentBuilderFactory.newInstance();

            DocumentBuilder builder = null;
            try {
                builder = factory.newDocumentBuilder();
            }
            catch (ParserConfigurationException e) {}

            Document document = null;
            document = builder.parse(args[0]);

            childLoop(document, "");

        } catch (Exception e) {
            e.printStackTrace(System.err);
        }

        for(int loopIndex = 0; loopIndex < numberLines; loopIndex++){

               System.out.println(displayText[loopIndex]);

           }
    }

The next order of business is to write childLoop, the method that will loop over all nodes in the XML document and store their data in the displayText array.

Looping Over Nodes

As in JavaScript, you're using the W3C DOM in Java today, so you're treating our XML document as a tree of nodes. Table 16.4 lists the fields of Java org.w3c.dom.Node objects, and Table 16.5 lists the methods of Java org.w3c.dom.Node objects.

Table 16.4. The Fields of the org.w3c.dom.Node Object

Field Summary

Stands For

static short ATTRIBUTE_NODE

An attribute

static short CDATA_SECTION_NODE

A CDATA section

static short COMMENT_NODE

A comment

static short DOCUMENT_FRAGMENT_NODE

A document fragment

static short DOCUMENT_NODE

A document

static short DOCUMENT_TYPE_NODE

A DTD

static short ELEMENT_NODE

An element

static short ENTITY_NODE

An entity

static short ENTITY_REFERENCE_NODE

An entity reference

static short NOTATION_NODE

A notation

static short PROCESSING_INSTRUCTION_NODE

A processing instruction

static short TEXT_NODE

A text node

Table 16.5. Methods of the org.w3c.dom.Node Interface

Method

What It Does

Node appendChild(Node newChild)

Appends the given node to the end of the children of the current node.

NamedNodeMap getAttributes()

Returns the attributes of an element node.

NodeList getChildNodes()

Gets the children of this node.

Node getFirstChild()

Gets the first child of this node.

Node getLastChild()

Gets the last child of this node.

String getLocalName()

Gets the local part of the full name of this node.

String getNamespaceURI()

Gets the namespace URI of this node.

Node getNextSibling()

Gets the node following this node.

String getNodeName()

Gets the name of this node.

short getNodeType()

Gets the type of this node.

String getNodeValue()

Gets the value of this node.

Document getOwnerDocument()

Gets the Document object for this node.

Node getParentNode()

Gets the parent of this node.

String getPrefix()

Gets the namespace prefix of this node.

Node getPreviousSibling()

Gets the node preceding the current node.

boolean hasAttributes()

Returns True if this node has any attributes.

boolean hasChildNodes()

Returns True if this node has any children.

Node insertBefore(Node newChild, Node refChild)

Inserts the new node before a reference child node.

void normalize()

Transforms all text nodes into XML normalized form.

Node removeChild(Node oldChild)

Removes a child node and returns it.

Node replaceChild(Node newChild, Node oldChild)

Replaces the child node.

void setNodeValue(String nodeValue)

Sets the node's value.

void setPrefix(String prefix)

Sets the namespace prefix of the node.

The childLoop method has a node and an indentation string passed to it. To handle the current node and store its data in the displayText array, first check whether the current node is valid, and if it is, get its type by using the getNodeType method:

public static void childLoop(Node node, String indentation)
{
    if (node == null) {

           return;

       }


       int type = node.getNodeType();
        .
        .
        .
}

Now that you know the type of the node you've been passed, how do you handle it and store its data in the array of strings that will be printed out? You have to handle different types of nodes in different ways, and in this case, you'll use a Java switch statement to work with different node types, starting with the document node itself.

Handling Document Nodes

You can compare the type of the current node to the fields listed in Table 16.4 to determine what kind of node you're dealing with. For example, if the current node is a document node, you'll just put a generic XML declaration into the display string's array, storing that text in the displayText array and incrementing the array's index value, numberLines, like this:

public static void childLoop(Node node, String indentation)
{
    if (node == null) {
        return;
    }

    int type = node.getNodeType();

    switch (type) {

           case Node.DOCUMENT_NODE: {

               displayText[numberLines] = indentation;

               displayText[numberLines] += "<?xml version=\"1.0\" encoding=\""+

                 "UTF-8" + "\"?>";

               numberLines++;

               childLoop(((Document)node).getDocumentElement(), "");

               break;

            }

   
              .
   

   
              .
   

   
              .
   

Now you've displayed a generic XML declaration for the start of the XML document. Next, you'll handle elements.

Handling Elements

Elements have the type Node.ELEMENT_NODE, and you can get the name of the element by using the W3C DOM method getNodeName. Here's what it looks like in the childLoop method's switch statement, which lets you handle the various node types:

case Node.ELEMENT_NODE: {
    displayText[numberLines] = indentation;
    displayText[numberLines] += "<";
    displayText[numberLines] += node.getNodeName();
        .
        .
        .

This gives us the name of the current element, but what if it has attributes? You'll check that next.

Handling Attributes

To see whether the element you're working on has any attributes, you can use the getAttributes method, which returns NamedNodeMap object, which contains the element's attributes. If there are any attributes, you'll store them in an array and then use the getNodeName method to get the attribute's name, and you'll use the getNodeValue method to get the attribute's value:

int length = (node.getAttributes() != null) ?
    node.getAttributes().getLength() : 0;
Attr attributes[] = new Attr[length];
for (int loopIndex = 0; loopIndex < length; loopIndex++) {
    attributes[loopIndex] = (Attr)node.getAttributes().item(loopIndex);
}

for (int loopIndex = 0; loopIndex < attributes.length; loopIndex++) {
    Attr attribute = attributes[loopIndex];
    displayText[numberLines] += " ";
    displayText[numberLines] += attribute.getNodeName();
    displayText[numberLines] += "=\"";
    displayText[numberLines] += attribute.getNodeValue();
    displayText[numberLines] += "\"";
}
displayText[numberLines] += ">";

numberLines++;

Table 16.6 lists the methods of NamedNodeMap.

Table 16.6. NamedNodeMap Methods

Method

What It Does

int getLength()

Returns the number of nodes.

Node getNamedItem(java.lang.String name)

Gets a node specified by the name.

Node getNamedItemNS(java.lang.String namespaceURI, java.lang.String localName)

Gets a node specified by the local name and namespace URI.

Node item(int index)

Gets an item in the map by index.

Node removeNamedItem(java.lang.String name)

Removes a node.

Node removeNamedItemNS(java.lang.String namespaceURI, java.lang.String localName)

Removes the given node with a local name and namespace URI.

Table 16.7 lists the methods of Attr objects, which hold attributes.

Table 16.7. Attr Interface Methods

Method

What It Does

java.lang.String getName()

Gets the name of this attribute.

Element getOwnerElement()

Gets this attribute's element node.

boolean getGiven()

Returns True if this attribute was given a value in the original document.

java.lang.String getValue()

Gets the value of the attribute.

void setValue(String value)

Sets the value of the attribute.

Now you've handled the current element's name and attributes. But what if the element has child nodes, such as text nodes or child elements? That's coming up next.

Handling Child Nodes

Elements can have child nodes, so before you finish up with elements, you'll also loop over those child nodes by calling the childLoop again recursively. You can use the following to get a NodeList interface of child nodes by using the getChildNodes method, increase the indentation level by four spaces, and call childLoop for each child node:

    NodeList childNodes = node.getChildNodes();
    if (childNodes != null) {
        length = childNodes.getLength();
        indentation += "    ";
        for (int loopIndex = 0; loopIndex < length; loopIndex++ ) {
           childLoop(childNodes.item(loopIndex), indentation);
        }
    }
    break;
}

The NodeList interface supports an ordered collection of nodes. Table 26.8 lists the methods of the NodeList interface.

Table 16.8. NodeList Methods

Method

What It Does

int getLength()

Returns the number of nodes.

Node item(int index)

Gets the item at a specified index.

Now that you have handled elements, attributes, and child nodes, you'll take a look at how to work with text nodes.

Handling Text Nodes

Text nodes are of type Node.TEXT_NODE, and after you check to make sure a node is a valid text node, you can trim extra spaces (such as indentation text) from the text node's value and add it to the displayText array, like this:

case Node.TEXT_NODE: {
    displayText[numberLines] = indentation;
    String trimmedText = node.getNodeValue().trim();
    if(trimmedText.indexOf("\n") < 0 && trimmedText.length() > 0) {
        displayText[numberLines] += trimmedText;
        numberLines++;
    }
    break;
}

Handling Processing Instructions

Handling processing instructions is not difficult; you just use getNodeName to get the processing instruction and the getNodeValue method to get the processing instruction's data. Here's how that works in the childLoop method's switch statement:

case Node.PROCESSING_INSTRUCTION_NODE: {
    displayText[numberLines] = indentation;
    displayText[numberLines] += "<?";
    displayText[numberLines] += node.getNodeName();
    String text = node.getNodeValue();
    if (text != null && text.length() > 0) {
        displayText[numberLines] += text;
    }
    displayText[numberLines] += "?>";
    numberLines++;
    break;
}

Handling CDATA Sections

Handling CDATA sections is just as easy as handling other nodes: You just use the getNodeValue method to get the CDATA section's data. Here's what that looks like in the childLoop method's switch statement:

case Node.CDATA_SECTION_NODE: {
    displayText[numberLines] = indentation;
    displayText[numberLines] += "<![CDATA[";
    displayText[numberLines] += node.getNodeValue();
    displayText[numberLines] += "]]>";
    numberLines++;
    break;
        }
    }

Ending Elements

Our last task is to add a closing tag for element nodes. Up to this point, you've only displayed an opening tag for each element and its attributes, but no closing tag. Here's how to add the closing tag with some code at the end of the childLoop method:

    if (type == Node.ELEMENT_NODE) {
        displayText[numberLines] = indentation.substring(0,
            indentation.length() - 4);
        displayText[numberLines] += "</";
        displayText[numberLines] += node.getNodeName();
        displayText[numberLines] += ">";
        numberLines++;
        indentation += "    ";
    }
}

That's it; Listing 16.2 shows all the code in ch16_02.java.

Example 16.2. Parsing XML Documents by Using Java (ch16_02.java)

import javax.xml.parsers.*;
import org.w3c.dom.*;

public class ch16_02
{
    static String displayText[] = new String[1000];
    static int numberLines = 0;

    public static void main(String args[])
    {
        try {
            DocumentBuilderFactory factory =
                DocumentBuilderFactory.newInstance();

            DocumentBuilder builder = null;
            try {
                builder = factory.newDocumentBuilder();
            }
            catch (ParserConfigurationException e) {}

            Document document = null;
            document = builder.parse(args[0]);

            childLoop(document, "");

        } catch (Exception e) {
            e.printStackTrace(System.err);
        }

        for(int loopIndex = 0; loopIndex < numberLines; loopIndex++){
            System.out.println(displayText[loopIndex]) ;
        }
    }

    public static void childLoop(Node node, String indentation)
    {
        if (node == null) {
            return;
        }

        int type = node.getNodeType();

        switch (type) {
            case Node.DOCUMENT_NODE: {
                displayText[numberLines] = indentation;
                displayText[numberLines] +=
                    "<?xml version=\"1.0\" encoding=\""+
                  "UTF-8" + "\"?>";
                numberLines++;
                childLoop(((Document)node).getDocumentElement(), "");
                break;
             }

             case Node.ELEMENT_NODE: {
                 displayText[numberLines] = indentation;
                 displayText[numberLines] += "<";
                 displayText[numberLines] += node.getNodeName();

                 int length = (node.getAttributes() != null) ?
                     node.getAttributes().getLength() : 0;
                 Attr attributes[] = new Attr[length];
                 for (int loopIndex = 0; loopIndex < length; loopIndex++) {
                     attributes[loopIndex] =
                         (Attr)node.getAttributes().item(loopIndex);
                 }

                 for (int loopIndex = 0; loopIndex < attributes.length;
                     loopIndex++) {
                     Attr attribute = attributes[loopIndex];
                     displayText[numberLines] += " ";
                     displayText[numberLines] += attribute.getNodeName();
                     displayText[numberLines] += "=\"";
                     displayText[numberLines] += attribute.getNodeValue();
                     displayText[numberLines] += "\"";
                 }
                 displayText[numberLines] += ">";

                 numberLines++;

                 NodeList childNodes = node.getChildNodes();
                 if (childNodes != null) {
                     length = childNodes.getLength();
                     indentation += "    ";
                     for (int loopIndex = 0; loopIndex < length; loopIndex++ ) {
                        childLoop(childNodes.item(loopIndex), indentation);
                     }
                 }
                 break;
             }

             case Node.TEXT_NODE: {
                 displayText[numberLines] = indentation;
                 String trimmedText = node.getNodeValue().trim();
                 if(trimmedText.indexOf("\n") < 0 && trimmedText.length() > 0){
                     displayText[numberLines] += trimmedText;
                     numberLines++;
                 }
                 break;
             }

             case Node.PROCESSING_INSTRUCTION_NODE: {
                 displayText[numberLines] = indentation;
                 displayText[numberLines] += "<?";
                 displayText[numberLines] += node.getNodeName();
                 String text = node.getNodeValue();
                 if (text != null && text.length() > 0) {
                     displayText[numberLines] += text;
                 }
                 displayText[numberLines] += "?>";
                 numberLines++;
                 break;
             }

             case Node.CDATA_SECTION_NODE: {
                 displayText[numberLines] = indentation;
                 displayText[numberLines] += "<![CDATA[";
                 displayText[numberLines] += node.getNodeValue();
                 displayText[numberLines] += "]]>";
                 numberLines++;
                 break;
            }
        }

        if (type == Node.ELEMENT_NODE) {
            displayText[numberLines] = indentation.substring(0,
                indentation.length() - 4);
            displayText[numberLines] += "</";
            displayText[numberLines] += node.getNodeName();
            displayText[numberLines] += ">";
            numberLines++;
            indentation += "    ";
        }
    }
}

Now compile ch16_02.java by using javac, the Java compiler:

%javac ch16_02.java

This creates ch16_02.class, which is ready to be run. Use the following to run this .class file to extract all the data from ch16_01.xml:

%java ch16_02 ch16_01.xml

Figure 16.1 shows the results of this example in a Windows MS-DOS window. So far today, you've been able to extract all the data in an XML document, format it, and display it.

16fig01.gif

Figure 16.1 Parsing an XML document by using Java.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020