Formatting Text Ranges
The basic text-formatting element of WordProcessingML is the text range (w:r tag). You’ll find this element within the paragraphs (w:p tags) or Word fields (w:fldSimple tags). The w:r tag contains the actual text embedded in the w:t tag (covered in the previous sections) as well as range properties expressed as children of the w:rPr (range properties) tag. For example:
- The w:b tag indicates that the range should be bold; the w:i tag indicates it should (also) be italic.
- The w:rStyle tag associates a text style with the range.
- w:color defines the range’s text color.
Transformation of text ranges into HTML tags could be very easy if you don’t care about semantic markup—for every range with properties that interest you, create a SPAN element and set its class and style attributes:
- Set the SPAN’s class to the value of the w:rStyle tag.
- Set font-weight to bold if the range has a w:b property tag.
- Set font-style to italic if the range has a w:i property tag.
- Set the text-color attribute of the SPAN’s style to the value of the w:color tag.
Listing 6 shows this transformation. The transformation (download the full XSL stylesheet) checks for interesting range property tags and creates a SPAN element if they’re present (setting its class and style attributes), or just outputs the range text otherwise.
Listing 6 Translate the text range formatting into SPAN elements.
<xsl:template match="w:r"> <xsl:choose> <xsl:when test="w:rPr/w:b or w:rPr/w:i or w:rPr/w:rStyle or w:rPr/w:color"> <span> <xsl:if test="w:rPr/w:rStyle"> <xsl:attribute name="class"> <xsl:value-of select="w:rPr/w:rstyle/@w:val" /> </xsl:attribute> </xsl:if> <xsl:attribute name="style"> <xsl:if test="w:rPr/w:b">font-weight: bold;</xsl:if> <xsl:if test="w:rPr/w:i">font-style: italic;</xsl:if> <xsl:if test="w:rPr/w:color">color: <xsl:value-of select="w:rPr/w:color/@w:val" />;</xsl:if> </xsl:attribute> <xsl:apply-templates /> </span> </xsl:when> <xsl:otherwise><xsl:apply-templates /></xsl:otherwise> </xsl:choose> </xsl:template>
More meaningful transformation would transform the bold text ranges into STRONG HTML tags and italicized text into EM tags as follows:
- A SPAN element is still needed if the text range is formatted with a Word text style or has font color definitions (although you might optimize this rule by attaching class and style attributes to STRONG or EM tags).
- If the text range has a w:b property, a STRONG tag is generated. If the w:i property is present, an EM tag is generated (within the STRONG tag, if both properties are present).
Listing 7 includes the enhanced transformation. You can also download the whole XSL transformation document.
Listing 7 Translate the text range formatting into semantic markup elements.
<xsl:template match="w:r"> <xsl:choose> <xsl:when test="w:rPr/w:rStyle or w:rPr/w:color"> <span> <xsl:if test="w:rPr/w:rStyle"> <xsl:attribute name="class"><xsl:value-of select="w:rPr/w:rstyle/@w:val" /></xsl:attribute> </xsl:if> <xsl:attribute name="style"> <xsl:if test="w:rPr/w:color">color: <xsl:value-of select="w:rPr/w:color/@w:val" />;</xsl:if> </xsl:attribute> <xsl:call-template name="isRangeBold" /> </span> </xsl:when> <xsl:otherwise><xsl:call-template name="isRangeBold" /></xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="isRangeBold"> <xsl:choose> <xsl:when test="w:rPr/w:b"><strong><xsl:call-template name="isRangeItalic" /></strong></xsl:when> <xsl:otherwise><xsl:call-template name="isRangeItalic" /></xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template name="isRangeItalic"> <xsl:choose> <xsl:when test="w:rPr/w:i"> <em><xsl:apply-templates /></em> </xsl:when> <xsl:otherwise><xsl:apply-templates /></xsl:otherwise> </xsl:choose> </xsl:template>
The text range transformation is performed in four steps:
- Create the SPAN tag, if needed.
- Create the STRONG tag, if needed.
- Create the EM tag, if needed.
- Output the actual text by using the xsl:apply-templates instruction.
Each step contains a select statement (xsl:choose instruction), generating an extra tag only in one case and calling the next step in both cases. While this technique might seem as verbose to you as a COBOL program to a C++ programmer (and you would be right), it’s the easiest way to approach this problem in XSL due to its limitations.