5.2 Communicating with the XSLT processor
5.2.1 Serializing the result tree
The <xsl:output> top-level element:
-
is a request to serialize the result tree as a sequence of bytes.
- The XSLT processor may choose to respect the request, but is not obliged.
All attributes of <xsl:output> are optional.
-
The method="method-indication" attribute may take these values:
-
method="html"
-
uses the HTML vocabulary and SGML markup conventions, namely:
-
empty elements,
-
attribute minimization,
-
built-in character entity referencing (ISO Latin 1),
-
the entire set of HTML conventions (you cannot selectively turn on only a subset of them);
-
all conventions are used according to common practice,
-
-
-
is considered the default in certain result tree conditions;
-
the name of the document element node is HTML (case insensitive);
-
-
-
-
the null namespace URI is used for the name (i.e.: there is no namespace prefix);
-
any preceding text nodes contain only whitespace,
-
method="xml"
-
uses arbitrary vocabulary and XML markup conventions, namely:
-
empty elements,
-
built-in character entity referencing,
-
is the default when the default isn't HTML,
-
method="text"
-
uses no vocabulary and no lexical or syntactic conventions,
-
serializes only the text nodes of every element in the result tree,
-
outputs all characters in clear text (no entities of any kind),
-
is never the default,
-
method="prefix:processor-recognized-method-name"
-
uses the prefix defined by xmlns:prefix="processor-recognized-URI-reference";
-
uses lexical and syntactic conventions recognized by the XSLT processor;
-
in particular, serialization can be arbitrary (it is out of the scope of XSLT);
-
-
is never the default.
-
Attributes related to the method are as follows:
-
version="numeric-version"
-
specifies the version of the output method,
-
-
omit-xml-declaration="yes" or omit-xml-declaration="no"
-
specifies the absence or presence of the XML declaration (if the result tree represents a document entity) or the text declaration (if the result tree represents an external general parsed entity),
-
-
standalone="yes" or standalone="no"
-
specifies the presence or absence of a standalone document declaration,
-
-
doctype-system="system-identifier"
-
specifies the system identifier to use in the DOCTYPE declaration,
-
-
doctype-public="public-identifier"
-
specifies the public identifier to use in the DOCTYPE declaration,
-
requires doctype-system=to also be specified if the output method is XML.
-
-
-
Attributes related to the serialized markup syntax are as follows:
-
indent="yes"
-
asks the XSLT processor (at its discretion) to indent the result "nicely" with additional whitespace when using the xml method;
-
this may have implications for the downstream parsing processes if the whitespace is considered significant,
-
-
-
cdata-section-elements="list-of-element-type-names"
-
gives a whitespace separated list of element types possibly used in the result,
-
specifies those result tree elements whose text content is serialized within a CDATA section.
-
-
-
Attributes related to the encoding are as follows:
-
encoding="encoding"
-
requests (if supported by the processor) the character set encoding output of the emitted result tree,
-
has the value which should match the encoding=pseudo-attribute described by the XML Recommendation for the XML declaration,
-
-
media-type="media-type"
-
specifies the MIME content type (without specifying the charsetparameter).
-
-
5.2.2 Illustration of output methods
Consider a simple XML file nodein.xml created using the 8-bit ISO character set for Western European languages Latin1, a.k.a. ISO88591 (note the copyright symbol seen here is encoded in the file using the hexadecimal character 0xA9):
Example 55 An XML source file with characters sensitive to processing
Line 1 <?xml version="1.0" encoding="iso-8859-1"?> 2 <p>Test with © and < and & in it</p>
Figure 51 illustrates the node tree that is created by the XSL processor.
Figure 51 Illustration of Node Tree Characters
Note how the markup used to represent the sensitive XML characters is lost. The node tree shown would also be created identically by the following markup:
Example 56 The same information using a CDATA section
Line 1 <?xml version="1.0" encoding="iso-8859-1"?> 2 <p><![CDATA[Test with © and < and & in it]]></p>
All character values in text nodes are maintained as UCS2 (Universal Character Set Two Octet) characters. The UCS character set is a 32-bit (4 octet) repertoire with a 16-bit (2 octet) repertoire subset (equivalent to Unicode) that can be serialized as either 16-bit (2 octet) characters or, using an encoding called UTF8, as a sequence of 8-bit (1 octet) characters.
Utilizing an extension element defined in XT providing for multiple result trees, one can copy the source node tree to each of three result
trees, such that each result tree is identical to the source tree, and interpret each result tree differently:
Example 57 Emission of the source tree using three different output methods
Line 1 <?xml version="1.0"?> <!--nodeout.xsl--> 2 <!--XSLT 1.0 - http://www.CraneSoftwrights.com/training --> 3 <!--XT (see http://www.jclark.com/xml/xt.html)--> 4 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 5 version="1.0" 6 xmlns:xt="http://www.jclark.com/xt" 7 extension-element-prefixes="xt"> 8 9 <xsl:template match="/"> 10 <xt:document method="xml" href="nodeout.xml" 11 omit-xml-declaration="yes"> 12 <xsl:copy-of select="."/> 13 </xt:document> 14 <xt:document method="html" href="nodeout.htm"> 15 <xsl:copy-of select="."/> 16 </xt:document> 17 <xt:document method="text" href="nodeout.txt"> 18 <xsl:copy-of select="."/> 19 </xt:document> 20 </xsl:template> 21 22 </xsl:stylesheet>
There is a nuance here regarding the use of the extension element: the <xt:document> element creates a separate result tree, and is not a result tree element itself that resides in a single "master" result tree as might be evident.
The use of method="xml" emits the same nodes using the UCS characters of the text nodes while using the built-in XML entities where necessary:
Example 58 XML output method emission of sample instance
<p>Test with © and < and & in it</p>
Note the two-character UTF8 hexadecimal representation of the copyright symbol is 0xC2 0xA9 which would both be revealed in a non-UTF8 presentation environment such as an ISO88591 Latin1 environment as follows:
<p>Test with © and < and & in it</p>
The use of method="html" recognizes known built-in HTML entities and uses the entity references where necessary:
Example 59 HTML output method emission of sample instance
<p>Test with © and < and & in it</p>
The use of method="text" ignores all element start and end tags and puts out the UCS characters of all the text nodes while not using any built-in entities:
Example 510 Text output method emission of sample instance
Test with © and < and & in it
Note again in a non-UTF8 environment this text file would appear as two characters as in the ISO88591 Latin1 environment:
Test with © and < and & in it
5.2.3 Communicating with the outside environment
These instructions are used for communication between the stylesheet and the XSLT processor and the operator:
-
stylesheet to operator: <xsl:message>
-
it contains an arbitrary message such as
-
status of progress,
-
content violation;
-
-
the specific mechanism of communication is not standardized;
-
the processor may choose to not support relating the message,
-
the content is any template (static or calculated),
-
this instruction can contain the terminate="yes" attribute
-
which gives an instruction to stop any further processing of the stylesheet and source files,
-
-
this instruction allows the stylesheet to report on semantic validation;
-
when content has been detected as being incorrect, messages can report problems to the operator;
-
structural well-formedness correctness has already been determined by the XML processor inside the XSLT processor;
-
stylesheet could also use XPath to determine structural validity if the XSLT processor does not use a validating XML processor;
-
-
this instruction allows the stylesheet to report progress when manipulating large data sets,
-
-
operator to stylesheet: <xsl:param>
-
it provides an invocation-time parameterized value for a globally scoped bound variable;
-
the specific mechanism of communication is not standardized;
-
the processor may choose to not support value specification;
-
a default value can be specified should no value be supplied at invocation,
-
-
processor to stylesheet:
-
to obtain the value of a system property, use:
-
system-property('prefix:property-name')
-
use XSLT namespace to indicate reserved system properties:
-
xsl:version
-
returns a decimal number (not a string) of the XSLT processor's implementation level in order to test the level of functionality for a given stylesheet;
-
-
xsl:vendor and xsl:vendor-url
-
each returns a string indicating, respectively, the name and URL (Uniform Resource Locator RFC1738/RFC1808/RFC2396) of the vendor of the executing XSL processor;
-
-
-
use other namespaces to indicate extension system properties:
the processor returns the empty string for an unrecognized property.
xmlns:prefix="processor-recognized-URI-reference"
system-property('prefix:property-name')
The following example illustrates how to tell the operator the stylesheet uses features not supported by the processor.
Example 511 An example of utilizing available system properties
Line 1 <xsl:choose> 2 <xsl:when test="system-property('xsl:version') >= 2.0"> 3 <xsl:feature-of-2.0/> 4 </xsl:when> 5 <xsl:otherwise> 6 <xsl:message terminate="yes"> 7 Sorry, this stylesheet requires XSLT 2.0 8 Complain to: '<xsl:value-of 9 select="system-property('xsl:vendor')"/>' 10 at '<xsl:value-of select="system-property('xsl:vendor-url')"/>'. 11 </xsl:message> 12 13 </xsl:otherwise> </xsl:choose>
5.2.4 Uncontrolled processes
There is no recommendation-based user or stylesheet control over or communication available regarding the following processes implemented by the XSLT processor.
-
Result tree attribute order:
-
the XSLT processor may choose to serialize attribute nodes found in the result tree in any order.
-
-
Result tree serialization instance markup:
-
the XSLT processor may choose any way it desires to serialize the content of text nodes when the stylesheet does not instruct a given element to be emitted as a CDATA section
-
using XML built-in character entities for markup-sensitive characters,
-
using numeric character entities for markup-sensitive characters or characters not present in the encoding character set,
-
using piecemeal CDATA sections;
-
-
any original markup syntax from the source file is lost when the source file is abstracted into the source node tree;
-
other than an entire element emitted as a CDATA section, there is no control available in the stylesheet over which serialization methods are used for text content.
-
-
Result tree construction:
-
the stylesheet writer is responsible for dictating the final content of the result tree;
-
the XSLT processor can use any means to effect the final result as described in the Recommendation without necessarily implementing the prose description found therein;
-
the side-effect free nature of the XSLT design (including the inability to change the value of bound variables) allows an XSLT processor to process portions of the input in parallel and combine the intermediate results into the single final result tree;
-
the of XSLT allows the processor to choose to not preserve the result node tree when serializing the transformed information to an output instance, thus the result may never actually exist as a complete tree within the processor.
-