- Sams Teach Yourself XML in 21 Days, Third Edition
- Table of Contents
- About the Author
- Acknowledgments
- We Want to Hear from You!
- Introduction
- Part I: At a Glance
- Day 1. Welcome to XML
- All About Markup Languages
- All About XML
- Looking at XML in a Browser
- Working with XML Data Yourself
- Structuring Your Data
- Creating Well-Formed XML Documents
- Creating Valid XML Documents
- How XML Is Used in the Real World
- Online XML Resources
- Summary
- Q&A
- Workshop
- Day 2. Creating XML Documents
- Choosing an XML Editor
- Using XML Browsers
- Using XML Validators
- Creating XML Documents Piece by Piece
- Creating Prologs
- Creating an XML Declaration
- Creating XML Comments
- Creating Processing Instructions
- Creating Tags and Elements
- Creating CDATA Sections
- Handling Entities
- Summary
- Q&A
- Workshop
- Day 3. Creating Well-Formed XML Documents
- What Makes an XML Document Well-Formed?
- Creating an Example XML Document
- Understanding the Well-Formedness Constraints
- Using XML Namespaces
- Understanding XML Infosets
- Understanding Canonical XML
- Summary
- Q&A
- Workshop
- Day 4. Creating Valid XML Documents: DTDs
- All About DTDs
- Validating a Document by Using a DTD
- Creating Element Content Models
- Commenting a DTD
- Supporting External DTDs
- Handling Namespaces in DTDs
- Summary
- Q&A
- Workshop
- Declaring Attributes in DTDs
- Day 5. Handling Attributes and Entities in DTDs
- Specifying Default Values
- Specifying Attribute Types
- Handling Entities
- Summary
- Q&A
- Workshop
- Day 6. Creating Valid XML Documents: XML Schemas
- Using XML Schema Tools
- Creating XML Schemas
- Dissecting an XML Schema
- The Built-in XML Schema Elements
- Creating Elements and Types
- Specifying a Number of Elements
- Specifying Element Default Values
- Creating Attributes
- Summary
- Q&A
- Workshop
- Day 7. Creating Types in XML Schemas
- Restricting Simple Types by Using XML Schema Facets
- Creating XML Schema Choices
- Using Anonymous Type Definitions
- Declaring Empty Elements
- Declaring Mixed-Content Elements
- Grouping Elements Together
- Grouping Attributes Together
- Declaring all Groups
- Handling Namespaces in Schemas
- Annotating an XML Schema
- Summary
- Q&A
- Workshop
- Part I. In Review
- Well-Formed Documents
- Valid Documents
- Part II: At a Glance
- Day 8. Formatting XML by Using Cascading Style Sheets
- Our Sample XML Document
- Introducing CSS
- Connecting CSS Style Sheets and XML Documents
- Creating Style Sheet Selectors
- Using Inline Styles
- Creating Style Rule Specifications in Style Sheets
- Summary
- Q&A
- Workshop
- Day 9. Formatting XML by Using XSLT
- Introducing XSLT
- Transforming XML by Using XSLT
- Writing XSLT Style Sheets
- Using <xsl:apply-templates>
- Using <xsl:value-of> and <xsl:for-each>
- Matching Nodes by Using the match Attribute
- Working with the select Attribute and XPath
- Using <xsl:copy>
- Using <xsl:if>
- Using <xsl:choose>
- Specifying the Output Document Type
- Summary
- Q&A
- Workshop
- Day 10. Working with XSL Formatting Objects
- Introducing XSL-FO
- Using XSL-FO
- Using XSL Formatting Objects and Properties
- Building an XSL-FO Document
- Handling Inline Formatting
- Formatting Lists
- Formatting Tables
- Summary
- Q&A
- Workshop
- Part II. In Review
- Using CSS
- Using XSLT
- Using XSL-FO
- Part III: At a Glance
- Day 11. Extending HTML with XHTML
- Why XHTML?
- Writing XHTML Documents
- Validating XHTML Documents
- The Basic XHTML Elements
- Organizing Text
- Formatting Text
- Selecting Fonts: <font>
- Comments: <!-->
- Summary
- Q&A
- Workshop
- Day 12. Putting XHTML to Work
- Creating Hyperlinks: <a>
- Linking to Other Documents: <link>
- Handling Images: <img>
- Creating Frame Documents: <frameset>
- Creating Frames: <frame>
- Creating Embedded Style Sheets: <style>
- Formatting Tables: <table>
- Creating Table Rows: <tr>
- Formatting Table Headers: <th>
- Formatting Table Data: <td>
- Extending XHTML
- Summary
- Q&A
- Workshop
- Day 13. Creating Graphics and Multimedia: SVG and SMIL
- Introducing SVG
- Creating an SVG Document
- Creating Rectangles
- Adobe's SVG Viewer
- Using CSS Styles
- Creating Circles
- Creating Ellipses
- Creating Lines
- Creating Polylines
- Creating Polygons
- Creating Text
- Creating Gradients
- Creating Paths
- Creating Text Paths
- Creating Groups and Transformations
- Creating Animation
- Creating Links
- Creating Scripts
- Embedding SVG in HTML
- Introducing SMIL
- Summary
- Q&A
- Workshop
- Day 14. Handling XLinks, XPointers, and XForms
- Introducing XLinks
- Beyond Simple XLinks
- Introducing XPointers
- Introducing XBase
- Introducing XForms
- Summary
- Workshop
- Part III. In Review
- Part IV: At a Glance
- Day 15. Using JavaScript and XML
- Introducing the W3C DOM
- Introducing the DOM Objects
- Working with the XML DOM in JavaScript
- Searching for Elements by Name
- Reading Attribute Values
- Getting All XML Data from a Document
- Validating XML Documents by Using DTDs
- Summary
- Q&A
- Workshop
- Day 16. Using Java and .NET: DOM
- Using Java to Read XML Data
- Finding Elements by Name
- Creating an XML Browser by Using Java
- Navigating Through XML Documents
- Writing XML by Using Java
- Summary
- Q&A
- Workshop
- Day 17. Using Java and .NET: SAX
- An Overview of SAX
- Using SAX
- Using SAX to Find Elements by Name
- Creating an XML Browser by Using Java and SAX
- Navigating Through XML Documents by Using SAX
- Writing XML by Using Java and SAX
- Summary
- Q&A
- Workshop
- Day 18. Working with SOAP and RDF
- Introducing SOAP
- A SOAP Example in .NET
- A SOAP Example in Java
- Introducing RDF
- Summary
- Q&A
- Workshop
- Part IV. In Review
- Part V: At a Glance
- Day 19. Handling XML Data Binding
- Introducing DSOs
- Binding HTML Elements to HTML Data
- Binding HTML Elements to XML Data
- Binding HTML Tables to XML Data
- Accessing Individual Data Fields
- Binding HTML Elements to XML Data by Using the XML DSO
- Binding HTML Tables to XML Data by Using the XML DSO
- Searching XML Data by Using a DSO and JavaScript
- Handling Hierarchical XML Data
- Summary
- Q&A
- Workshop
- Day 20. Working with XML and Databases
- XML, Databases, and ASP
- Storing Databases as XML
- Using XPath with a Database
- Introducing XQuery
- Summary
- Q&A
- Workshop
- Day 21. Handling XML in .NET
- Creating and Editing an XML Document in .NET
- From XML to Databases and Back
- Reading and Writing XML in .NET Code
- Using XML Controls to Display Formatted XML
- Creating XML Web Services
- Summary
- Q&A
- Workshop
- Part V. In Review
- Appendix A. Quiz Answers
- Quiz Answers for Day 1
- Quiz Answers for Day 2
- Quiz Answers for Day 3
- Quiz Answers for Day 4
- Quiz Answers for Day 5
- Quiz Answers for Day 6
- Quiz Answers for Day 7
- Quiz Answers for Day 8
- Quiz Answers for Day 9
- Quiz Answers for Day 10
- Quiz Answers for Day 11
- Quiz Answers for Day 12
- Quiz Answers for Day 13
- Quiz Answers for Day 14
- Quiz Answers for Day 15
- Quiz Answers for Day 16
- Quiz Answers for Day 17
- Quiz Answers for Day 18
- Quiz Answers for Day 19
- Quiz Answers for Day 20
- Quiz Answers for Day 21
Creating Tags and Elements
You give structure to the data in an XML document using elements. An XML element consists of a start tag and an end tag—except in the case of elements that are defined to be empty, which consist only of one tag—and might include character data and/or other elements. We've already seen both tags and elements in action.
Creating Tag Names
In XML 1.0, the names you give to a tag, like "message" in the tag <message>, are tightly controlled. You can start a tag name with a letter, an underscore, or a colon. The next characters might be letters, digits, underscores, hyphens, periods, and colons (but no whitespace).
In XML 1.1, things have changed. Instead of saying that everything not permitted is forbidden, XML 1.1 names are designed so that everything that is not forbidden is permitted. The idea is that because Unicode will continue to grow, further changes to XML can be avoided by allowing almost any character, including those not yet assigned, in names.
Formally speaking, in XML 1.1 you can start a name with :, A to Z, _, a to z, or the Unicode characters À to ˿, Ͱ to #x37D;, Ϳ to ῿, ‌ to #x200D;, ⁰ to ↏, Ⰰ to ⿯, 、 to ퟿, and 豈 to . This excludes -, ., and digits. The next characters in a name may include all the characters you can start a name with, as well as -, ., 0 to 9, ·, ̀ to ͯ, and ‿ to ⁀.
For example, here are some allowable XML tags:
<DOCUMENT> <document> <Chapter15> <Section-19> <_text>
Bear in mind that tag names are case sensitive, so <PUMPKIN> is not the same as <pumpkin>, which is not the same as <PuMpKiN>. Actually, your document can have <PUMPKIN>, <pumpkin>, and <PuMpKiN> tags at the same time, and they would all be considered different. Here are some tags that are not legal in XML:
<2005> <Loan Number> <.text> <*yay*> <EMPLOYEE(ID)>
So far, the elements you've seen have all contained data or other elements, but elements don't need to contain any content at all if they're empty.
Creating Empty Elements
In XML, empty elements only have one tag, not a start and end tag. You might be familiar with empty elements from HTML; for example, the HTML <img>, <li>, <hr>, and <br> elements are empty, which is to say that they do not enclose any content (either character data or markup). Empty elements are represented with only one tag (in HTML, there is no closing </img>, </li>, </hr>, and </br> tags).
In XML, you close an empty element with />, not just >. For example, if the <heading> element were an empty element, it might appear like this in an XML document:
<?xml version="1.0" encoding="UTF-8"?> <document> <heading/> <message> This is an XML document! </message> </document>
Empty elements can have attributes, as in this case, where we're using an attribute named text to hold the text content of this element:
<?xml version="1.0" encoding="UTF-8"?> <document> <heading text = "Hello From XML"/> <message> This is an XML document! </message> </document>
The <…/> syntax is XML's way of making sure that an XML processor isn't left searching for a nonexistent closing tag. In fact, in XHTML, which is the derivation of HTML in XML, the <img>, <li>, <hr>, and <br> elements are used as <img />, <li />, <hr />, and <br />, and HTML browsers don't have a problem with that.
Creating a Root Element
If you want your document to be well formed, it must have one element that contains all the other elements and text data in the document—the root element, also called the document element. In our sample XML file, the document element happens to be named <document>, although you can use any legal name.
Each well-formed XML document must contain one element that contains all the other elements, and the containing element is called the root element. The root element is a very important part of XML documents, especially when you look at them from an XML processor's point of view, because you parse XML documents starting with the root element. In ch02_01.xml, developed at the start of this chapter, the root element is the <document> element (although you can give the root element any name):
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/css" href="ch01_04.css"?> <document> <heading> Hello From XML </heading> <message> This is an XML document! </message> </document>
Creating Attributes
XML attributes, which can appear in elements, processing instructions, and XML declarations, work much like attributes in HTML. In XML, you use them in pairs like this: attributename = "value" in opening tags. Unlike HTML, note that the values you assign to attributes must be quoted (even if they're numbers), and that if you use an attribute, it must be assigned a value. (Some HTML attributes, like BORDER, don't need to be assigned a value.) Using DTDs or XML schemas, you can make an attribute required or optional—if required, you must use the attribute when you use the corresponding element, and you must assign the attribute a value. You can also specify what values an attribute may be assigned, if you want to.
You can see an example in Listing 2.2, where we've given each <employee> element an attribute named status, and are assigning the text "retired", "active", and "leave" to that attribute in various places in the document.
Example 2.2. Using Attributes in an XML Document (ch02_02.xml)
<?xml version = "1.0" standalone="yes"?> <document> <employee status="retired"> <name> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15, 2005</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> <project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee> <employee status="active"> <name> <lastname>Grant</lastname> <firstname>Cary</firstname> </name> <hiredate>October 20, 2005</hiredate> <projects> <project> <product>Desktop</product> <id>333</id> <price>$2995.00</price> </project> <project> <product>Scanner</product> <id>444</id> <price>$200.00</price> </project> </projects> </employee> <employee status="leave"> <name> <lastname>Gable</lastname> <firstname>Clark</firstname> </name> <hiredate>October 25, 2005</hiredate> <projects> <project> <product>Keyboard</product> <id>555</id> <price>$129.00</price> </project> <project> <product>Mouse</product> <id>666</id> <price>$25.00</price> </project> </projects> </employee> </document>
You can see this XML document in Internet Explorer, including the attributes and their values, in Figure 2.10.
Figure 2.10 Viewing element attributes in Internet Explorer.
Just like the data in an element, an XML processor can retrieve the values you've assigned to an element's attributes. We'll see how to do that in both JavaScript and Java later in this book.
Attributes hold data, and elements hold data—so when should you use which? It's up to you, but practically speaking, there are two things to take into account. The first is that you can't specify document structure using attributes. For example, this <employee> element makes it clear what data you're storing about an employee:
<employee status="retired"> <name> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15, 2005</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> <project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee>
A good rule to follow, therefore, is to use elements to structure your document, and to use attributes when you have more information to include about a specific element, as when you want to indicate the language the enclosed text is in. Here's an example where we're storing the standard abbreviation for U.S. English, "en-US", in an attribute:
<text language="en-US"> It was a dark and stormy night. A shot rang out! . . . </text>
Also, it's worth noting that using too many attributes can make a document hard to read, something you'll readily see if you start converting the earlier <employee> element to use attributes rather than subelements to hold its data:
<employee status="retired"> <name lastname="Kelly" firstname="Grace"/> <hiredate>October 15, 2005</hiredate> <projects> <project product="Printer" id="111" price="$111.00"/> . . .
Naming Attributes
In XML, attribute names must follow the same rules as those for element names. That means in XML 1.0 you can start an attribute name with a letter, an underscore, or a colon, and the next characters may be letters, digits, underscores, hyphens, periods, and colons (but no whitespace). In XML 1.1, you follow the rules for XML 1.1 names, as discussed earlier today.
Here are some legal attribute name examples:
<brush width="10" height="5" color="cyan"/> <point x="10" y="100"/> <book title="My Sweet Summer" review="Yuck!"/> <vegetable name="broccoli" color="green"/>
Here are some attribute names that are not legal:
<fish measured length="500"/> <friend 1stPhone="555.2222" 2ndPhone="555.3333"/> <application .NET="yes"/> <person name(or nick name)="sammy"/>
Assigning Values to Attributes
As noted, all data in XML documents is text, including the data you assign to attributes. Even when you assign a number to an attribute, you treat that number as if it were text:
<constant name="pi" value="3.1415926"/>
You can use single or double quotation marks when quoting an attribute's value. By convention, double quotation marks are usually used, but if the value you're quoting contains double quotation marks—for example, He said, "No worries."—you can't just surround that value with double quotation marks, because the XML processor won't understand where the quotation begins and ends. Instead, you can use single quotation marks to begin and end the attribute's value like this:
<citation text='He said, "No worries."' />
What if the attribute value contains both single and double quotes, as when you want to say The tree was 16' 3" tall? In this case, you can use the XML-defined general entity references for a single quotation mark, ' and for a double quotation mark, ", like this:
<citation text="The tree was 16' 3" tall" />
The XML processor will turn this back into The tree was 16' 3" tall when it parses this text.
Specifying Language with the xml:lang Attribute
Besides xml:space, there's one more attribute that comes built into XML—xml:lang, which lets you specify the language of a document, such as English, German, and so on. Although xml:space and xml:lang are "built into" XML, and so should be usable with any element, some XML processors will not support these attributes.
You can set the xml:lang attribute to these values:
- A two-letter language code as defined by the International Organization for Standardization (ISO) document 639:1988, "Code for the Representation of Names of Languages."
- A language identifier registered with the Internet Assigned Numbers Authority (IANA) in the document "Registry of Language Tags." See http://www.isi.edu/in-notes/iana/assignments/languages/. Such identifiers begin with the prefix "i-" (or "I-").
- A language identifier assigned by you, or for private use. Such identifiers should begin with "x-" or "X-".
Here is an example; in this case, we're specifying that the language of an element should be English, using the xml:lang attribute and the ISO language code "en":
<p xml:lang="en">The quick brown fox jumped over the lazy dog.</p>
Besides specifying the language, you can also specify a language subcode to indicate a regional variation or dialect, such as U.S. English. These subcodes are two characters each, and they're also defined by the International Organization for Standardization in the document ISO 3166-1:1997, "Codes for the Representation of Names of Countries and Their Subdivisions—Part 1: Country Codes." For example, here's how you might specify that one element holds British English content, and one American English:
<p xml:lang="en-GB">What colour is the sky?</p> <p xml:lang="en-US">What color is the sky?</p>
Note that xml:lang specifies the language used in both the element's content (including all text data, if you use xml:lang in the document element), as well as an element's attribute values, as here, where we're using German in an element's attributes:
<p farbe="weiss" xml:lang="de">