- Chapter 1: Essential XSLT
- A Little Background
- XML Documents
- What Does XML Look Like in a Browser?
- XSLT Transformations
- Making an XSLT Transformation Happen
- Using Standalone XSLT Processors
- Using Browsers to Transform XML Documents
- Using XSLT and JavaScript in the Internet Explorer
- XSLT Transformations on Web Servers
- XML-to-XHTML Transformations
- XSLT Resources
- XSL Formatting Objects: XSL-FO
- XSL-FO Resources
- Formatting an XML Document
- The XSLT Stylesheet
- Transforming a Document into FormattingObject Form
- Creating a Formatted Document
XML Documents
It's going to be important for you to know how XML documents work, so use this section to ensure that you're up to speed. Here's an example XML document that I'll take a look at:
<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> <GREETING> Hello From XML </GREETING> <MESSAGE> Welcome to the wild and woolly world of XML. </MESSAGE> </DOCUMENT>
Here's how this document works: I start with the XML processing instruction <?xml version="1.0" encoding="UTF-8"?> (all XML processing instructions start with <? and end with ?>), which indicates that I'm using XML version 1.0, the only version currently defined, and UTF-8 character encoding, which means that I'm using an eight-bit condensed version of Unicode:
<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> <GREETING> Hello From XML </GREETING> <MESSAGE> Welcome to the wild and woolly world of XML. </MESSAGE> </DOCUMENT>
Next, I create a new tag named <DOCUMENT>. You can use any name, not just DOCUMENT, for a tag, as long as the name starts with a letter or underscore (_), and the following characters consist of letters, digits, underscores, dots (.), or hyphens (-), but no spaces. In XML, tags always start with < and end with >.
XML documents are made up of XML elements, and you create XMLelements with an opening tag, such as <DOCUMENT>, followed by any element content (if any), such as text or other elements, and ending with the matching closing tag that starts with </, such as </DOCUMENT>. You enclose the entire document, except for processing instructions, in one element, called the root element, and that's the <DOCUMENT> element here:
<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> . . . </DOCUMENT>
Now I'll add a new element, <GREETING>, that encloses text content (in this case, "Hello From XML") within this XML document as follows:
<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> <GREETING> Hello From XML </GREETING> . . . </DOCUMENT>
Next, I can add a new element as well, <MESSAGE>, which also encloses text content, like this:
<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> <GREETING> Hello From XML </GREETING> <MESSAGE> Welcome to the wild and woolly world of XML. </MESSAGE> </DOCUMENT>
Now the <DOCUMENT> root element contains two elements<GREETING> and <MESSAGE>. And each of the <GREETING> and <MESSAGE> elements themselves hold text. In this way, I've created a new XML document.
There's more to the story, howeverXML documents can also be well-formed and valid.
Well-Formed XML Documents
To be well-formed, an XML document must follow the syntax rules set up for XML by the W3C in the XML 1.0 recommendation (which you can find at http://www.w3.org/TR/REC-xml). Informally, "well-formed" means mostly that the document must contain one or more elements, and one element, the root element, must contain all the other elements. Also, each element must nest inside any enclosing elements properly. For example, the following document is not well formed, because the </GREETING> closing tag comes after theopening <MESSAGE> tag for the next element:
<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> <GREETING> Hello From XML <MESSAGE> </GREETING> Welcome to the wild and woolly world of XML. </MESSAGE> </DOCUMENT>
Valid XML Documents
Most XML browsers will check your document to see whether it is well-formed. Some of them can also check whether it's valid. An XML document is valid if a Document Type Declaration (DTD) or XML schema is associated with it, and if the document complies with that DTD or schema. That is, the DTD or schema specifies a set of rules for the document's own internal consistency, and if the browser can confirm that the document follows those rules, the document is valid.
XML schemas are gaining popularity, and much more support for schemas is coming in XSLT 2.0 (in fact, supporting XML schemas is the motivating force behind XSLT 2.0), but DTDs are still the most commonly used tools for ensuring validity. DTDs can be stored in a separate file, or they can be stored in the document itself, in a <!DOCTYPE> element. This example adds a <!DOCTYPE> element to the example XML document we developed:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="first.css"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (GREETING, MESSAGE)> <!ELEMENT GREETING (#PCDATA)> <!ELEMENT MESSAGE (#PCDATA)> ]> <DOCUMENT> <GREETING> Hello From XML </GREETING> <MESSAGE> Welcome to the wild and woolly world of XML. </MESSAGE> </DOCUMENT>
This book does not cover DTDs (see Inside XML for all the details on DTDs), but what this DTD says is that you can have <GREETING> and <MESSAGE> elements inside a <DOCUMENT> element, that the <DOCUMENT> element is the root element, and that the <GREETING> and <MESSAGE> elements can hold text.
You can have all kinds of hierarchies in XML documents, where one element encloses another, down to many levels deep. You can also give elements attributes, like this: <CIRCLE COLOR="blue">, where the COLOR attribute holds the value "blue." You can use such attributes to store additional data about elements. You can also include comments in XML documents that explain more about specific elements by enclosing comment text inside <!-- and -->.
Here's an example of an XML document, planets.xml, that puts these features to work by storing data about the planets Mercury, Venus, and Earth, such as their mass, length of their day, density, distance from the sun, and so on. This document is used throughout the book, because it includes most of the XML features you'll work with in a short, compact form:
Listing 1.1 planets.xml
<?xml version="1.0"?> <PLANETS> <PLANET> <NAME>Mercury</NAME> <MASS UNITS="(Earth = 1)">.0553</MASS> <DAY UNITS="days">58.65</DAY> <RADIUS UNITS="miles">1516</RADIUS> <DENSITY UNITS="(Earth = 1)">.983</DENSITY> <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion--> </PLANET> <PLANET> <NAME>Venus</NAME> <MASS UNITS="(Earth = 1)">.815</MASS> <DAY UNITS="days">116.75</DAY> <RADIUS UNITS="miles">3716</RADIUS> <DENSITY UNITS="(Earth = 1)">.943</DENSITY> <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion--> </PLANET> <PLANET> <NAME>Earth</NAME> <MASS UNITS="(Earth = 1)">1</MASS> <DAY UNITS="days">1</DAY> <RADIUS UNITS="miles">2107</RADIUS> <DENSITY UNITS="(Earth = 1)">1</DENSITY> <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion--> </PLANET> </PLANETS>
You also need to understand a few XML definitions in this book:
CDATA. Simple character data (that is, text that does not includeany markup).
ID. A proper XML name, which must be unique (that is, not shared by any other attribute of the ID type).
IDREF. Will hold the value of an ID attribute of some element, usually another element that the current element is related to.
IDREFS. Multiple IDs of elements separated by whitespace.
NAME Character. A letter, digit, period, hyphen, underscore, or colon.
NAME. An XML name, which must start with a letter, an underscore, or a colon, optionally followed by additional name characters.
NAMES. A list of names, separated by whitespace.
NMTOKEN. A token made up of one or more letters, digits, hyphens, underscores, colons, and periods.
NMTOKENS. Multiple proper XML names in a list, separated bywhitespace.
NOTATION. A notation name (which must be declared in the DTD).
PCDATA. Parsed character data. PCDATA does not include any markup, and any entity references have been expanded already in PCDATA.
That gives us an overview of XML documents, including what a well-formed and valid document is. If you don't feel you're up to speed on XML documents, read another book on the subject, such as Inside XML. You might also look at some of the XML resources on the Web:
http://www.w3c.org/xml. The World Wide Web Consortium's main XML site, the starting point for all things XML.
http://www.w3.org/XML/1999/XML-in-10-points. "XML In 10 Points" (actually only seven); an XML overview.
http://www.w3.org/TR/REC-xml. This is the official W3C recommendation for XML 1.0, the current (and only) version. Not terribly easy to read.
http://www.w3.org/TR/xml-stylesheet/. All about using stylesheets and XML.
http://www.w3.org/TR/REC-xml-names/. All about XML namespaces.
http://www.w3.org/XML/Activity.html. An overview of current XMLactivity at W3C.
http://www.w3.org/TR/xmlschema-0/, http://www.w3.org/TR/xmlschema-1/, and http://www.w3.org/TR/xmlschema-2/. XML schemas, the alternative to DTDs.
http://www.w3.org/TR/xlink/. The XLinks specification.
http://www.w3.org/TR/xptr. The XPointers specification.
http://www.w3.org/TR/xhtml1/. The XHTML 1.0 specification.
http://www.w3.org/TR/xhtml11/. The XHTML 1.1 specification.
http://www.w3.org/DOM/. The W3C Document Object Model, DOM.
So, now you've created XML documentshow can you take a look at them?