- Sams Teach Yourself XML in 21 Days, Third Edition
- Table of Contents
- About the Author
- Acknowledgments
- We Want to Hear from You!
- Introduction
- Part I: At a Glance
- Day 1. Welcome to XML
- All About Markup Languages
- All About XML
- Looking at XML in a Browser
- Working with XML Data Yourself
- Structuring Your Data
- Creating Well-Formed XML Documents
- Creating Valid XML Documents
- How XML Is Used in the Real World
- Online XML Resources
- Summary
- Q&A
- Workshop
- Day 2. Creating XML Documents
- Choosing an XML Editor
- Using XML Browsers
- Using XML Validators
- Creating XML Documents Piece by Piece
- Creating Prologs
- Creating an XML Declaration
- Creating XML Comments
- Creating Processing Instructions
- Creating Tags and Elements
- Creating CDATA Sections
- Handling Entities
- Summary
- Q&A
- Workshop
- Day 3. Creating Well-Formed XML Documents
- What Makes an XML Document Well-Formed?
- Creating an Example XML Document
- Understanding the Well-Formedness Constraints
- Using XML Namespaces
- Understanding XML Infosets
- Understanding Canonical XML
- Summary
- Q&A
- Workshop
- Day 4. Creating Valid XML Documents: DTDs
- All About DTDs
- Validating a Document by Using a DTD
- Creating Element Content Models
- Commenting a DTD
- Supporting External DTDs
- Handling Namespaces in DTDs
- Summary
- Q&A
- Workshop
- Declaring Attributes in DTDs
- Day 5. Handling Attributes and Entities in DTDs
- Specifying Default Values
- Specifying Attribute Types
- Handling Entities
- Summary
- Q&A
- Workshop
- Day 6. Creating Valid XML Documents: XML Schemas
- Using XML Schema Tools
- Creating XML Schemas
- Dissecting an XML Schema
- The Built-in XML Schema Elements
- Creating Elements and Types
- Specifying a Number of Elements
- Specifying Element Default Values
- Creating Attributes
- Summary
- Q&A
- Workshop
- Day 7. Creating Types in XML Schemas
- Restricting Simple Types by Using XML Schema Facets
- Creating XML Schema Choices
- Using Anonymous Type Definitions
- Declaring Empty Elements
- Declaring Mixed-Content Elements
- Grouping Elements Together
- Grouping Attributes Together
- Declaring all Groups
- Handling Namespaces in Schemas
- Annotating an XML Schema
- Summary
- Q&A
- Workshop
- Part I. In Review
- Well-Formed Documents
- Valid Documents
- Part II: At a Glance
- Day 8. Formatting XML by Using Cascading Style Sheets
- Our Sample XML Document
- Introducing CSS
- Connecting CSS Style Sheets and XML Documents
- Creating Style Sheet Selectors
- Using Inline Styles
- Creating Style Rule Specifications in Style Sheets
- Summary
- Q&A
- Workshop
- Day 9. Formatting XML by Using XSLT
- Introducing XSLT
- Transforming XML by Using XSLT
- Writing XSLT Style Sheets
- Using <xsl:apply-templates>
- Using <xsl:value-of> and <xsl:for-each>
- Matching Nodes by Using the match Attribute
- Working with the select Attribute and XPath
- Using <xsl:copy>
- Using <xsl:if>
- Using <xsl:choose>
- Specifying the Output Document Type
- Summary
- Q&A
- Workshop
- Day 10. Working with XSL Formatting Objects
- Introducing XSL-FO
- Using XSL-FO
- Using XSL Formatting Objects and Properties
- Building an XSL-FO Document
- Handling Inline Formatting
- Formatting Lists
- Formatting Tables
- Summary
- Q&A
- Workshop
- Part II. In Review
- Using CSS
- Using XSLT
- Using XSL-FO
- Part III: At a Glance
- Day 11. Extending HTML with XHTML
- Why XHTML?
- Writing XHTML Documents
- Validating XHTML Documents
- The Basic XHTML Elements
- Organizing Text
- Formatting Text
- Selecting Fonts: <font>
- Comments: <!-->
- Summary
- Q&A
- Workshop
- Day 12. Putting XHTML to Work
- Creating Hyperlinks: <a>
- Linking to Other Documents: <link>
- Handling Images: <img>
- Creating Frame Documents: <frameset>
- Creating Frames: <frame>
- Creating Embedded Style Sheets: <style>
- Formatting Tables: <table>
- Creating Table Rows: <tr>
- Formatting Table Headers: <th>
- Formatting Table Data: <td>
- Extending XHTML
- Summary
- Q&A
- Workshop
- Day 13. Creating Graphics and Multimedia: SVG and SMIL
- Introducing SVG
- Creating an SVG Document
- Creating Rectangles
- Adobe's SVG Viewer
- Using CSS Styles
- Creating Circles
- Creating Ellipses
- Creating Lines
- Creating Polylines
- Creating Polygons
- Creating Text
- Creating Gradients
- Creating Paths
- Creating Text Paths
- Creating Groups and Transformations
- Creating Animation
- Creating Links
- Creating Scripts
- Embedding SVG in HTML
- Introducing SMIL
- Summary
- Q&A
- Workshop
- Day 14. Handling XLinks, XPointers, and XForms
- Introducing XLinks
- Beyond Simple XLinks
- Introducing XPointers
- Introducing XBase
- Introducing XForms
- Summary
- Workshop
- Part III. In Review
- Part IV: At a Glance
- Day 15. Using JavaScript and XML
- Introducing the W3C DOM
- Introducing the DOM Objects
- Working with the XML DOM in JavaScript
- Searching for Elements by Name
- Reading Attribute Values
- Getting All XML Data from a Document
- Validating XML Documents by Using DTDs
- Summary
- Q&A
- Workshop
- Day 16. Using Java and .NET: DOM
- Using Java to Read XML Data
- Finding Elements by Name
- Creating an XML Browser by Using Java
- Navigating Through XML Documents
- Writing XML by Using Java
- Summary
- Q&A
- Workshop
- Day 17. Using Java and .NET: SAX
- An Overview of SAX
- Using SAX
- Using SAX to Find Elements by Name
- Creating an XML Browser by Using Java and SAX
- Navigating Through XML Documents by Using SAX
- Writing XML by Using Java and SAX
- Summary
- Q&A
- Workshop
- Day 18. Working with SOAP and RDF
- Introducing SOAP
- A SOAP Example in .NET
- A SOAP Example in Java
- Introducing RDF
- Summary
- Q&A
- Workshop
- Part IV. In Review
- Part V: At a Glance
- Day 19. Handling XML Data Binding
- Introducing DSOs
- Binding HTML Elements to HTML Data
- Binding HTML Elements to XML Data
- Binding HTML Tables to XML Data
- Accessing Individual Data Fields
- Binding HTML Elements to XML Data by Using the XML DSO
- Binding HTML Tables to XML Data by Using the XML DSO
- Searching XML Data by Using a DSO and JavaScript
- Handling Hierarchical XML Data
- Summary
- Q&A
- Workshop
- Day 20. Working with XML and Databases
- XML, Databases, and ASP
- Storing Databases as XML
- Using XPath with a Database
- Introducing XQuery
- Summary
- Q&A
- Workshop
- Day 21. Handling XML in .NET
- Creating and Editing an XML Document in .NET
- From XML to Databases and Back
- Reading and Writing XML in .NET Code
- Using XML Controls to Display Formatted XML
- Creating XML Web Services
- Summary
- Q&A
- Workshop
- Part V. In Review
- Appendix A. Quiz Answers
- Quiz Answers for Day 1
- Quiz Answers for Day 2
- Quiz Answers for Day 3
- Quiz Answers for Day 4
- Quiz Answers for Day 5
- Quiz Answers for Day 6
- Quiz Answers for Day 7
- Quiz Answers for Day 8
- Quiz Answers for Day 9
- Quiz Answers for Day 10
- Quiz Answers for Day 11
- Quiz Answers for Day 12
- Quiz Answers for Day 13
- Quiz Answers for Day 14
- Quiz Answers for Day 15
- Quiz Answers for Day 16
- Quiz Answers for Day 17
- Quiz Answers for Day 18
- Quiz Answers for Day 19
- Quiz Answers for Day 20
- Quiz Answers for Day 21
Understanding the Well-Formedness Constraints
The well-formedness constraints in the XML 1.0 specification are sprinkled throughout the document, and some of them are hard to dig out because they're not clearly marked. You'll get a look at the well-formedness constraints here, although note that some of them have to do with DTDs and entity references, and those will appear in Day 4, "Creating Valid XML Documents: Document Type Definitions," and Day 5, "Handling Attributes and Entities in DTDs."
Beginning the Document with an XML Declaration
The first well-formedness structure constraint is to start the document with an XML declaration. Even though some XML processors won't insist on it, W3C says you should always include this declaration first thing:
<?xml version = "1.0" encoding="UTF-8" standalone="yes"?> <document> <employee> . . .
Using Only Legal Character References
Another well-formedness constraint is that character references, which are character codes enclosed in & and ;, and which are replaced by the characters that code stands for, must only refer to characters supported by the XML specification.
This constraint is more or less obvious—it simply means that you have to stick to the established character set for the version of XML you're using. Note that, as you saw yesterday, the characters that are legal in XML 1.0 differ somewhat from what's legal in XML 1.1.
Including at Least One Element
To be a well-formed document, a document must include one or more elements. The first element, of course, is the root element, so to be well-formed, a document must contain at least a root element. In other words, an XML document must contain more than just a prolog. Of course, your documents will usually contain many elements, as in our example document:
<?xml version = "1.0" encoding="UTF-8" standalone="yes"?> <document> <employee> <name> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15, 2005</hiredate> <projects> <project> . . . </project> </projects> </employee> . . . </document>
Structuring Elements Correctly
HTML browsers are pretty easygoing about how you structure HTML elements in a Web page as long as they can understand what you're doing. For example, you can often omit closing tags in elements—you might use a <p> tag and then follow it with another <p> tag—without using a </p> tag—and the browser will have no problem.
That's not the way things work in XML. In XML, every non-empty element must have both a start tag and an end tag, as in our example document:
<employee> <name> <lastname>Gable</lastname> <firstname>Clark</firstname> </name> <hiredate>October 25, 2005</hiredate> <projects> <project> <product>Keyboard</product> <id>555</id> <price>$129.00</price> </project> <project> <product>Mouse</product> <id>666</id> <price>$25.00</price> </project> </projects> </employee>
Besides making sure that every non-empty element has an opening tag and a closing tag, another well-formedness constraint says that end tags must match start tags, and both must use the same name.
Some elements—empty elements—don't have closing tags. These tags have no content of any kind (although they can have attributes), which means that they do not enclose any character data or markup. Instead, these elements are made up entirely of one tag like this:
<?xml version = "1.0" standalone="yes"?> <document> <heading text = "Hello From XML"/> </document>
In XML, empty elements must always end with />.
Using the Root Element to Contain All Other Elements
Another well-formedness constraint is that the root element must contain all the other elements in the document, as in our sample XML document, where we have three <employee> elements, which themselves contain other elements, in the document element:
<?xml version = "1.0" encoding="UTF-8" standalone="yes"?> <document> <employee> . . . </employee> <employee> . . . </employee> <employee> . . . </employee> </document>
That's how a well-formed XML document works—you start with a prolog, followed by the root element, which contains all the other the elements, if there are any. Among other things, containing all elements in a root element makes it easier for an XML processor to understand the structure of an XML document—starting at the single root element, it can navigate the entire document.
Nesting Elements Properly
Nesting elements correctly is a big part of well-formedness; the requirement here is that if an element contains a start tag for a non-empty tag, it must also contain that element's end tag. In other words, you cannot spread an element over other elements at the same level. For example, this XML is nested properly:
<employee> <name> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15, 2005</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> </project> <project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee>
But as you can see, there's a nesting problem in this next element, because an XML processor will encounter a new <project> tag before finding the closing </project> tag it's looking for at the end of the current <project> element:
<employee> <name> <lastname>Kelly</lastname> <firstname>Grace</firstname> </name> <hiredate>October 15, 2005</hiredate> <projects> <project> <product>Printer</product> <id>111</id> <price>$111.00</price> <project> </project> <product>Laptop</product> <id>222</id> <price>$989.00</price> </project> </projects> </employee>
In fact, this nesting requirement is where the whole term well-formed comes from—the original idea was that a document where the elements were not garbled and mixed up with each other was well-formed.
There are other well-formedness constraints that have nothing to do with elements, however—for example, the next two concern attributes.
Making Attribute Names Unique
Another well-formedness constraint is that you can't use the same attribute more than once in one start-tag or empty-element tag. This is another well-formedness constraint that seems more or less obvious, and it's hard to see how you might violate this one except by mistake, as in this case:
<message text="Hi there!" text="Hello!">
XML is case sensitive, so you could theoretically do something like this:
<message Text="Hi there!" text="Hello!">
Obviously, that's not a very good idea, however; attribute names that differ only in capitalization are bound to be confusing.
Enclose Attribute Values in Quotation Marks
One well-formedness constraint that trips up most XML novices sooner or later is that you must quote every value you assign to an attribute, using either single quotation marks or double quotation marks. This trips many people up because you don't have to quote attribute values in HTML, as in this HTML example (which also doesn't have a closing tag):
<img src=mountains.jpg>
An XML processor would have problems with this element, however. Here's what it would look like properly constructed:
<img src="mountains.jpg" />
If you prefer, you could use single quotation marks:
<img src=mountains.jpg' />
As you've seen, using single quotation marks helps when an attribute's value contains quoted text:
<message text='I said, "No, no, no!"' />
And as you've also seen, in worst-case scenarios, where an attribute value contains both single and double quotation marks, you can escape " as " and ' as '—as here, where you're reporting the height of a tree as 50' 6" :
<tree type="Maple" height="50'6"" />
Avoiding Entity References and < in Attribute Values
Also, W3C makes it an explicit well-formedness constraint that you should avoid references to external entities (this means XML-style references—general entity references or parameter entity references, not just, for example, using an image file's name) in attribute values. This means that an XML processor doesn't have to replace an attribute value with the contents of an external entity.
In addition, another constraint says that you are not supposed to use < in attribute values, because an XML processor might mistake it for markup. If you really have to use the text <, use < instead, which will be turned into < when parsed. For example, this XML:
<project note="This is a <project> element.">
should be written as this, where you're escaping both < and >:
<project note="This is a <project> element.">
In fact, < is a particularly sensitive character to use anywhere in an XML document, except as markup, and that's another well-formedness constraint concerning <, coming up next.
Avoiding Overuse of < and &
XML processors assume that < starts a tag and & starts an entity reference, so you should avoid using those characters for anything else. Sometimes, this is a problem, as in the JavaScript example you saw yesterday, which uses the JavaScript < operator that enclosed in a CDATA section:
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/tr/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title> Checking the temperature </title> </head> <body> <script language="javascript"> <![CDATA[ var temperature temperature = 234.77 if (temperature < 32) { document.writeln("Below freezing!") } ]]> </script> <center> <h1> Checking the temperature </h1> </center> </body> </html>
However, because modern Web browsers don't understand CDATA sections, this solution (which was suggested by W3C) doesn't really work. And if you escape the > operator as <, very few browsers will understand what you're doing.
There are two main ways of handling the < JavaScript operator in XML with today's browsers. You can reverse the logical sense of the test—for example, in this case, instead of checking whether the temperature is below 32, you would check to make sure it isn't above or equal to 32, which lets you use > instead of < (note that the JavaScript ! operator, the Not operator, reverses the logical sense of an expression) :
<script language="javascript"> var temperature temperature = 234.77 if (!(temperature >= 32)) { document.writeln("Below freezing!") } </script>
Practically speaking, the best way is usually to remove the whole problem by placing the script code in an external file, which you'll name script.js here, so the browser won't parse it as XML in the first place. You can do that like this in JavaScript (more on JavaScript and how to use it in XML is coming up in Day 15, "Using JavaScript and XML"):
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/tr/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title> Checking the temperature </title> </head> <body> <script language="javascript" src="script.js"> </script> <center> <h1> Checking the temperature </h1> </center> </body> </html>
That completes today's discussion of well-formedness, although you'll see more in the next two days as we discuss the well-formedness constraints that have to do with DTDs.
As your XML documents evolve and become more complex, it's also going to be increasingly important to understand namespaces, which are the second major topic for today.