- Sams Teach Yourself XML in 21 Days, Third Edition
- Table of Contents
- About the Author
- Acknowledgments
- We Want to Hear from You!
- Introduction
- Part I: At a Glance
- Day 1. Welcome to XML
- All About Markup Languages
- All About XML
- Looking at XML in a Browser
- Working with XML Data Yourself
- Structuring Your Data
- Creating Well-Formed XML Documents
- Creating Valid XML Documents
- How XML Is Used in the Real World
- Online XML Resources
- Summary
- Q&A
- Workshop
- Day 2. Creating XML Documents
- Choosing an XML Editor
- Using XML Browsers
- Using XML Validators
- Creating XML Documents Piece by Piece
- Creating Prologs
- Creating an XML Declaration
- Creating XML Comments
- Creating Processing Instructions
- Creating Tags and Elements
- Creating CDATA Sections
- Handling Entities
- Summary
- Q&A
- Workshop
- Day 3. Creating Well-Formed XML Documents
- What Makes an XML Document Well-Formed?
- Creating an Example XML Document
- Understanding the Well-Formedness Constraints
- Using XML Namespaces
- Understanding XML Infosets
- Understanding Canonical XML
- Summary
- Q&A
- Workshop
- Day 4. Creating Valid XML Documents: DTDs
- All About DTDs
- Validating a Document by Using a DTD
- Creating Element Content Models
- Commenting a DTD
- Supporting External DTDs
- Handling Namespaces in DTDs
- Summary
- Q&A
- Workshop
- Declaring Attributes in DTDs
- Day 5. Handling Attributes and Entities in DTDs
- Specifying Default Values
- Specifying Attribute Types
- Handling Entities
- Summary
- Q&A
- Workshop
- Day 6. Creating Valid XML Documents: XML Schemas
- Using XML Schema Tools
- Creating XML Schemas
- Dissecting an XML Schema
- The Built-in XML Schema Elements
- Creating Elements and Types
- Specifying a Number of Elements
- Specifying Element Default Values
- Creating Attributes
- Summary
- Q&A
- Workshop
- Day 7. Creating Types in XML Schemas
- Restricting Simple Types by Using XML Schema Facets
- Creating XML Schema Choices
- Using Anonymous Type Definitions
- Declaring Empty Elements
- Declaring Mixed-Content Elements
- Grouping Elements Together
- Grouping Attributes Together
- Declaring all Groups
- Handling Namespaces in Schemas
- Annotating an XML Schema
- Summary
- Q&A
- Workshop
- Part I. In Review
- Well-Formed Documents
- Valid Documents
- Part II: At a Glance
- Day 8. Formatting XML by Using Cascading Style Sheets
- Our Sample XML Document
- Introducing CSS
- Connecting CSS Style Sheets and XML Documents
- Creating Style Sheet Selectors
- Using Inline Styles
- Creating Style Rule Specifications in Style Sheets
- Summary
- Q&A
- Workshop
- Day 9. Formatting XML by Using XSLT
- Introducing XSLT
- Transforming XML by Using XSLT
- Writing XSLT Style Sheets
- Using <xsl:apply-templates>
- Using <xsl:value-of> and <xsl:for-each>
- Matching Nodes by Using the match Attribute
- Working with the select Attribute and XPath
- Using <xsl:copy>
- Using <xsl:if>
- Using <xsl:choose>
- Specifying the Output Document Type
- Summary
- Q&A
- Workshop
- Day 10. Working with XSL Formatting Objects
- Introducing XSL-FO
- Using XSL-FO
- Using XSL Formatting Objects and Properties
- Building an XSL-FO Document
- Handling Inline Formatting
- Formatting Lists
- Formatting Tables
- Summary
- Q&A
- Workshop
- Part II. In Review
- Using CSS
- Using XSLT
- Using XSL-FO
- Part III: At a Glance
- Day 11. Extending HTML with XHTML
- Why XHTML?
- Writing XHTML Documents
- Validating XHTML Documents
- The Basic XHTML Elements
- Organizing Text
- Formatting Text
- Selecting Fonts: <font>
- Comments: <!-->
- Summary
- Q&A
- Workshop
- Day 12. Putting XHTML to Work
- Creating Hyperlinks: <a>
- Linking to Other Documents: <link>
- Handling Images: <img>
- Creating Frame Documents: <frameset>
- Creating Frames: <frame>
- Creating Embedded Style Sheets: <style>
- Formatting Tables: <table>
- Creating Table Rows: <tr>
- Formatting Table Headers: <th>
- Formatting Table Data: <td>
- Extending XHTML
- Summary
- Q&A
- Workshop
- Day 13. Creating Graphics and Multimedia: SVG and SMIL
- Introducing SVG
- Creating an SVG Document
- Creating Rectangles
- Adobe's SVG Viewer
- Using CSS Styles
- Creating Circles
- Creating Ellipses
- Creating Lines
- Creating Polylines
- Creating Polygons
- Creating Text
- Creating Gradients
- Creating Paths
- Creating Text Paths
- Creating Groups and Transformations
- Creating Animation
- Creating Links
- Creating Scripts
- Embedding SVG in HTML
- Introducing SMIL
- Summary
- Q&A
- Workshop
- Day 14. Handling XLinks, XPointers, and XForms
- Introducing XLinks
- Beyond Simple XLinks
- Introducing XPointers
- Introducing XBase
- Introducing XForms
- Summary
- Workshop
- Part III. In Review
- Part IV: At a Glance
- Day 15. Using JavaScript and XML
- Introducing the W3C DOM
- Introducing the DOM Objects
- Working with the XML DOM in JavaScript
- Searching for Elements by Name
- Reading Attribute Values
- Getting All XML Data from a Document
- Validating XML Documents by Using DTDs
- Summary
- Q&A
- Workshop
- Day 16. Using Java and .NET: DOM
- Using Java to Read XML Data
- Finding Elements by Name
- Creating an XML Browser by Using Java
- Navigating Through XML Documents
- Writing XML by Using Java
- Summary
- Q&A
- Workshop
- Day 17. Using Java and .NET: SAX
- An Overview of SAX
- Using SAX
- Using SAX to Find Elements by Name
- Creating an XML Browser by Using Java and SAX
- Navigating Through XML Documents by Using SAX
- Writing XML by Using Java and SAX
- Summary
- Q&A
- Workshop
- Day 18. Working with SOAP and RDF
- Introducing SOAP
- A SOAP Example in .NET
- A SOAP Example in Java
- Introducing RDF
- Summary
- Q&A
- Workshop
- Part IV. In Review
- Part V: At a Glance
- Day 19. Handling XML Data Binding
- Introducing DSOs
- Binding HTML Elements to HTML Data
- Binding HTML Elements to XML Data
- Binding HTML Tables to XML Data
- Accessing Individual Data Fields
- Binding HTML Elements to XML Data by Using the XML DSO
- Binding HTML Tables to XML Data by Using the XML DSO
- Searching XML Data by Using a DSO and JavaScript
- Handling Hierarchical XML Data
- Summary
- Q&A
- Workshop
- Day 20. Working with XML and Databases
- XML, Databases, and ASP
- Storing Databases as XML
- Using XPath with a Database
- Introducing XQuery
- Summary
- Q&A
- Workshop
- Day 21. Handling XML in .NET
- Creating and Editing an XML Document in .NET
- From XML to Databases and Back
- Reading and Writing XML in .NET Code
- Using XML Controls to Display Formatted XML
- Creating XML Web Services
- Summary
- Q&A
- Workshop
- Part V. In Review
- Appendix A. Quiz Answers
- Quiz Answers for Day 1
- Quiz Answers for Day 2
- Quiz Answers for Day 3
- Quiz Answers for Day 4
- Quiz Answers for Day 5
- Quiz Answers for Day 6
- Quiz Answers for Day 7
- Quiz Answers for Day 8
- Quiz Answers for Day 9
- Quiz Answers for Day 10
- Quiz Answers for Day 11
- Quiz Answers for Day 12
- Quiz Answers for Day 13
- Quiz Answers for Day 14
- Quiz Answers for Day 15
- Quiz Answers for Day 16
- Quiz Answers for Day 17
- Quiz Answers for Day 18
- Quiz Answers for Day 19
- Quiz Answers for Day 20
- Quiz Answers for Day 21
Working with the select Attribute and XPath
You can assign the select attribute XPath expressions, which are used to indicate exactly what node or nodes you want to use in an XML document. XPath has been a W3C recommendation since November 16, 1999. You can find the XPath recommendation—for the current version, 1.0—at http://www.w3.org/TR/xpath. Version 2.0 of XPath is on the way, and it's currently in working draft form at this point; see http://www.w3.org/TR/xpath20. (Very little software supports XPath 2.0 yet; the Saxon XSLT processor—at http://saxon.sourceforge.net—provides some support for it.)
XPath expressions are more powerful than the match expressions you've already seen; for one thing, they're not restricted to working with the current node or direct child nodes; you can use them to work with parent nodes, ancestor nodes, and more.
To specify a node or set of nodes in XPath, you use a location path. A location path consists of one or more location steps, separated by / (to refer to a child node) or // (to refer to any descendant node). If you start the location path with /, the location path is called an absolute location path because you're specifying the path from the root node; otherwise, the location path is relative. And the node an XPath expression is working on is called the context node.
Location steps are made up of an axis, a node test, and zero or more predicates. For example, in the expression child::state[position() = 2] (which picks out the second <state> child of the context node), child is the name of the axis, state is the node test, and [position() = 2] is a predicate. You can create location paths with one or more location steps. For example, /descendant::state/child::name selects all the <name> elements that have a <state> parent. You'll get the details about what kind of axes, node tests, and predicates XPath supports in the following sections.
Using Axes
In the location step child::bird, which refers to a <bird> element that is a child of the current node, child is called the axis. XPath supports many different axes, and it's important to know what they are. Here's the list:
- ancestor — This axis contains the ancestors of the context node. An ancestor node is the parent of the context node, the parent of the parent, and so forth, back to (and including) the root node.
- ancestor-or-self — This axis contains the context node and the ancestors of the context node.
- attribute — This axis contains the attributes of the context node.
- child — This axis contains the children of the context node.
- descendant — This axis contains the descendants of the context node. A descendant is a child or a child of a child and so on.
- descendant-or-self — This axis contains the context node and the descendants of the context node.
- following — This axis contains all nodes that come after the context node.
- following-sibling — This axis contains all the following siblings of the context node.
- namespace — This axis contains the namespace nodes of the context node.
- parent — This axis contains the parent of the context node.
- preceding — This axis contains all nodes that come before the context node.
- preceding-sibling — This axis contains all the preceding siblings of the context node.
- self — This axis contains the context node.
Note that although the match attribute can only use the child or attribute axes in location steps (that's the major restriction on the match attribute compared to the select attribute), the select attribute can use any of the 13 axes. (The term sibling in XML refers to an item on the same level as the current item.)
For example, this template extracts the value of the <name> element by using the location path child::name:
<xsl:template match="state"> <HTML> <BODY> <xsl:value-of select="child::name"/> </BODY> </HTML> </xsl:template>
This is really the same as the version you've already been using because, as mentioned, you can abbreviate it by omitting the child:: part:
<xsl:template match="state"> <HTML> <BODY> <xsl:value-of select="name"/> </BODY> </HTML> </xsl:template>
In the location step child::name, child is the axis and name is the node test, which is described in the following section.
Using Node Tests
After you specify the axis you want to use in a location step, you specify the node test. A node test indicates what type of node you want to match. You can use names of nodes as node tests, or you can use the wildcard * to select element nodes. For example, the expression child::*/child::flower selects all <flower> elements that are grandchildren of the current node. Besides nodes and the wildcard character, you can also use these node tests:
- comment() — This node test selects comment nodes.
- node() — This node test selects any type of node.
- processing-instruction() — This node test selects a processing instruction node. You can specify, in the parentheses, the name of the processing instruction to select.
- text() — This node test selects a text node.
Using Predicates
The last part of a location step is the predicate. In a location step, the (optional) predicate narrows the search down even more. For example, the location step child::state[position() = 1] uses the predicate [position() = 1] to select not just a child <state> element but the first <state> child element.
Predicates can get pretty involved because there are all kinds of XPath expressions that you can work with in predicates. And there are various types of legal XPath expressions; here are the possible types:
- Booleans
- Node sets
- Numbers
- Strings
The following sections look at how expressions help you in XSLT.
Boolean Expressions
XPath Boolean values are true/false values, and you can use the built-in XPath logical operators to produce Boolean results. These are the logical operators:
- != — This stands for "is not equal to."
- < — This stands for "is less than." (You use < for this in XML documents.)
- <= — This stands for "is less than or equal to."
- = — This stands for "is equal to."
- > — This stands for "is greater than."
- >= — This stands for "is greater than or equal to."
For example, here's how to use a logical operator to match all <state> elements after the first three, using the position() function (which you'll see in the next section):
<xsl:template match="state[position() > 3]"> <xsl:value-of select="."/> </xsl:template>
You can also use the keywords and and or to connect Boolean expressions. The following example selects all <state> elements after the first three and before the tenth one:
<xsl:template match="state[position() > 3 and position() < 10]"> <xsl:value-of select="."/> </xsl:template>
In addition, you can use the not() function to reverse the logical sense of an expression. The following example selects all <state> elements except the last one, using the last() function (which you'll see in the next section):
<xsl:template match="state[not(position() = last())]"> <xsl:value-of select="."/> </xsl:template>
Node Sets
Besides Boolean values, XPath can also work with node sets. A node set is just a set of nodes. By collecting nodes into a set, XPath lets you work with multiple nodes at once. For example, the location step child::state/child::bird returns a node list of all <bird> elements that are children of <state> elements.
You can use various XPath functions to work with node sets. For example, the last() function picks out the last node in the node set. The following are the node set functions:
- last() — Returns the number of nodes in the node set.
- position() — Returns the position of the context node in the node set. (The first node is Node 1.)
- count( node-set ) — Returns the number of nodes in node-set.
- id( ID ) — Returns a node set that contains the element whose ID value matches ID.
- local-name( node-set ) — Returns the name of the first node in node-set.
- namespace-uri( node-set ) — Returns the URI of the namespace of the first node in node-set.
- name( node-set ) — Returns the qualified name of the first node in node-set.
Some of these functions can be very useful. For example, you can number the states in the XML sample from earlier today by using the position() function, as shown in Listing 9.12.
Example 9.12. An XSL Style Sheet That Uses position() (ch09_12.xsl)
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="states"> <HTML> <HEAD> <TITLE> The States </TITLE> </HEAD> <BODY> <H1> The States </H1> <xsl:apply-templates select="state"/> </BODY> </HTML> </xsl:template> <xsl:template match="state"> <P> <xsl:value-of select="position()"/>. <xsl:value-of select="name"/> </P> </xsl:template> </xsl:stylesheet>
Here's what an XSLT processor produces when you use this style sheet on the sample XML document:
<HTML> <HEAD> <META http-equiv="Content-Type" content="text/html; charset=UTF-8"> <TITLE> The States </TITLE> </HEAD> <BODY> <H1> The States </H1> <P>1. California</P> <P>2. Massachusetts</P> <P>3. New York</P> </BODY> </HTML>
Note that the states are indeed numbered. Also, as with today's other examples, the whitespace and indenting here have been cleaned up. Figure 9.5 shows the result of this transformation.
Figure 9.5 Numbering items by using XSLT.
When you're working on the nodes in a node set, you can use functions such as position() to target specific nodes. For example, child::state[position() = 1] selects the first <state> child of the node, where you apply this location step, and child::state[position() = last()] selects the last.
Numbers
XPath can use numbers in expressions (for example, the 1 in the expression child::state[position() = 1]). There are also some operators that you can use to work with numbers:
- + — Addition.
- - — Subtraction.
- * — Multiplication.
- div — Division. Note that the / character that stands for division in other languages is used for other purposes in XML and XPath.
- mod — Modulus. This operation returns the remainder after one number is divided by another.
For example, if you use <xsl:value-of select="2 + 2"/>, you get the string "4" in the output document. The following example selects all states that have at least 200 people per square mile:
<xsl:template match="states"> <HTML> <BODY> <P> <xsl:apply-templates select="state[population div area > 200]"/> </P> </BODY> </HTML> </xsl:template>
Besides the numeric operators, XPath also has these functions that work with numbers:
- ceiling() — Returns the smallest integer larger than the number you pass in the parentheses. For example, ceiling(4.6) returns 5.
- floor() — Returns the largest integer smaller than the number you pass it. For example, floor(4.6) returns 4.
- round() — Rounds the number you pass it to the nearest integer. For example, round(4.6) returns 6.
- sum() — Returns the sum of the numbers you pass it.
For example, here's how to find the total population of the states in ch09_01.xml by using sum():
<xsl:template match="states"> <HTML> <BODY> <P> The total population is: <xsl:value-of select="sum(child::population)"/> </P> </BODY> </HTML> </xsl:template>
Strings
Strings in XPath are treated as Unicode characters. A number of XPath functions are specially designed to work on strings. Here they are:
- concat( string1, string2 , ...) — Returns the strings joined together.
- contains( string1, string2 ) — Returns true if the first string contains the second one.
- format-number( number1, string2, string3 ) — Returns a string that holds the formatted string version of number1 , using string2 as a formatting string, and string3 as an optional locale string. (You create formatting strings as you would for Java's java.text.DecimalFormat method.)
- normalize-space( string1 ) — Returns string1 after stripping leading and trailing whitespace and replacing multiple consecutive empty spaces with a single space.
- starts-with (string1, string2 ) — Returns true if the first string starts with the second string.
- string-length( string1 ) — Returns the number of characters in string1 .
- substring( string1, offset, length ) — Returns length characters from the string, starting at offset .
- substring-after( string1, string2 ) — Returns the part of string1 after the first occurrence of string2 .
- substring-before( string1, string2 ) — Returns the part of string1 up to the first occurrence of string2 .
- translate( string1, string2, string3 ) — Returns string1 with all occurrences of the characters in string2 replaced with the matching characters in string3 .
Now you know what items can go into location steps—axes, node tests, and predicates. XPath syntax is far from intuitive, so let's see some more examples as you take a look at XPath abbreviations and default rules.
XPath Abbreviations and Default Rules
So far you have specifically indicated what axis you wanted to use when writing location steps, but there are ways to abbreviate location steps to make things easier. For example, as mentioned earlier, the location step child::state points to a <state> element that is a child element of the context node, but you can abbreviate that location step simply as state. These are the legal abbreviations:
Location Step |
Abbreviation |
self::node() |
. |
parent::node() |
.. |
child:: childname |
childname |
attribute:: childname |
@ childname |
/descendant-or-self::node()/ |
// |
You can also abbreviate predicate expressions. For example, you can abbreviate [position() = 8] as [8].
Here are some examples of location paths using abbreviated syntax:
- * — Matches all element children of the context node.
- */*/state — Matches all <state> great-grandchildren of the context node.
- . — Matches the context node.
- .. — Matches the parent of the context node.
- ../@units — Matches the units attribute of the parent of the context node.
- .//state — Matches all <state> element descendants of the context node.
- //state — Matches all <state> descendants of the root node.
- //state/name — Matches all <name> elements that have a <state> parent.
- /states/state[4]/name[3] — Matches the third <name> element of the fourth <state> element of the <states> element.
- @* — Matches all the attributes of the context node.
- @units — Matches the units attribute of the context node.
- state — Matches the <state> element children of the context node.
- state[@nickname and @units] — Matches all the <state> children of the context node that have both a nickname attribute and a units attribute.
- state[@units = "people"] — Matches all <state> children of the context node that have a units attribute that has the value "people".
- state[7] — Matches the seventh <state> child of the context node.
- state[7][@units = "people"] — Matches the seventh <state> child of the context node if that child has a units attribute with the value "people".
- state[last()] — Matches the last <state> child of the context node.
- state[name] — Matches the <state> children of the context node that themselves have <name> children.
- state[name="Massachusetts"] — Matches the <state> child nodes of the context node that have <name> children whose text value is "Massachusetts".
- states//state — Matches all <state> element descendants of the <states> element children of the context node.
- text() — Matches all child text nodes of the context node.
Listing 9.13 shows an example that uses abbreviated syntax. This example picks out the state bird for each state and lists it by using text such as "The Quail is the California state bird." When you're inside a <state> element's <bird> template, you can reach the <name> element of the state by using ../name, as shown in this example.
Example 9.13. An XSL Style Sheet That Uses Abbreviated Syntax (ch09_13.xsl)
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="states"> <HTML> <BODY> <xsl:apply-templates select="state"/> </BODY> </HTML> </xsl:template> <xsl:template match="state"> <P> <xsl:apply-templates select="bird"/> </P> </xsl:template> <xsl:template match="bird"> The <xsl:value-of select="."/> is the <xsl:value-of select="../name"/> state bird. </xsl:template> </xsl:stylesheet>
Here are the results of applying this style sheet to the sample XML document:
<HTML> <BODY> <P> The Quail is the California state bird. </P> <P> The Chickadee is the Massachusetts state bird. </P> <P> The Bluebird is the New York state bird. </P> </BODY> </HTML>
Figure 9.6 shows these results in Figure 9.6. This is a good example that shows how to extract and work with data from XML documents by using XSLT.
Figure 9.6 Using abbreviated syntax.
While you're discussing built-in abbreviated syntax, it's also worth noting that XSLT also has some built-in default rules, some of which you've already seen in action.
The most important default rule applies to elements, and here's how you might put it in XSLT syntax:
<xsl:template match="/ | *"> <xsl:apply-templates/> </xsl:template>
What this means is that if you don't supply a template for an element, that element is still processed with <xsl:apply-templates/> to handle the element's child nodes.
Similarly, the default rule for attributes is to place the value of the attribute in the output document, as in this example:
<xsl:template match="@*"> <xsl:value-of select="."/> </xsl:template>
The default rule for text is to just insert the text into the output document. That rule can be expressed like this, where the XPath text() function just returns the text in a text node:
<xsl:template match="text()"> <xsl:value-of select="."/> </xsl:template>
However, the content of processing instructions (which may be matched by using the XPath processing-instruction() function) and comments (which may be matched by using the XPath comment() function) is not inserted into the output document by default. You might express their default rules like this:
<xsl:template match="processing-instruction()"/> <xsl:template match="comment()"/>
In fact, you can create whole style sheets that rely entirely on default rules. Here's what that might look like:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> </xsl:stylesheet>
Here's what you get when you apply this default-rules-only style sheet to ch09_01.xml:
<?xml version="1.0" encoding="UTF-8"?> California 33871648 Sacramento Quail Golden Poppy 155959 Massachusetts 6349097 Boston Chickadee Mayflower 7840 New York 18976457 Albany Bluebird Rose 47214
Note that just the raw data in the document is transferred to the output document, which is the way things work by default in XSLT.
XPath Tools
There's no question that it can take some time to get used to XPath syntax. Fortunately, there are some good tools out there to help, such as the XPath Visualiser by Dimitre Novatchev, which you can get for free at http://www.vbxml.com/downloads/default.asp?id=visualiser. To use this tool, you just have to browse to the XML document you want to work with and enter the XPath expression you want to check. The XPath Visualiser then highlights in yellow nodes that match your expression. For example, Figure 9.7 shows this tool working on the sample XML document with the XPath expression //*[@units]. This is a great way to test your XPath expressions until you get them to do what you want; all you need in order to use this tool is a browser.
Figure 9.7 Using the XPath Visualiser.