- Microsoft Word as Authoring Tool
- Setting Up the Infrastructure
- Basic Transformation
- Transforming Paragraphs
- Formatting Text Ranges
- Summary
Transforming Paragraphs
The preceding section showed how you can generate simple <P> tags from Word paragraphs. In most cases, you’d want to perform at least two additional transformation steps:
- Generate heading tags (<H1>, <H2>, <H3>, etc.) based on Word outline level.
- For all other paragraph types, include the name of the Word paragraph style as the class tag of the generated <P> tag.
The change needed to generate the class tag is very simple. The Word style name is stored in the w:val attribute of the w:pPr/w:pStyle child. If the Word style name equals some predefined value (such as normal or BodyText) or doesn’t exist at all, we ignore it; otherwise, the class attribute is appended to the <P> tag (see Listing 4).
Listing 4 Word paragraph style converted into the class attribute of the <P> tag.
<xsl:template match="w:p[ancestor::w:body]"> <p> <xsl:variable name="paraStyle" select="w:pPr/w:pStyle/@w:val" /> <xsl:choose> <xsl:when test="not($paraStyle)" /> <xsl:when test="$paraStyle = ’normal’ or $paraStyle = ’BodyText’" /> <xsl:otherwise> <xsl:attribute name="class"><xsl:value-of select="$paraStyle" /></xsl:attribute> </xsl:otherwise> </xsl:choose> <xsl:apply-templates /> </p> </xsl:template>
To transform the Word outline paragraph styles into the HTML heading tags, we extract the Word paragraph style name as just described and check the w:outlineLvl child of the corresponding w:style tag. The transformation should generate the HTML heading tags if the outline level exists for the selected style, or regular <P> tags otherwise. To avoid a massive xsl:choose select block, we’ll use the xsl:element instruction to create output HTML tags dynamically (see Listing 5). You can also download the complete XSL template for this example.
Listing 5 Translate the w:p tags into headings and regular paragraphs.
<xsl:template match="w:p[ancestor::w:body]"> <xsl:variable name="paraStyle" select="w:pPr/w:pStyle/@w:val" /> <xsl:variable name="outLvl" select="//w:style[@w:type = ’paragraph’ and @w:styleId=$paraStyle]/w:pPr/w:outlineLvl/@w:val" /> <xsl:variable name="elName"> <xsl:choose> <xsl:when test="$outLvl">h<xsl:value-of select="$outLvl + 1" /></xsl:when> <xsl:otherwise>p</xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:element name="{$elName}"> <xsl:choose> <xsl:when test="$elName != ’p’" /> <xsl:when test="not($paraStyle)" /> <xsl:when test="$paraStyle = ’normal’ or $paraStyle = ’BodyText’" /> <xsl:otherwise> <xsl:attribute name="class"><xsl:value-of select="$paraStyle" /></xsl:attribute> </xsl:otherwise> </xsl:choose> <xsl:apply-templates /> </xsl:element> </xsl:template>
As you can see, the paragraph transformation is becoming more and more complex:
- The paragraph style is extracted into the paraStyle variable.
- Outline level is extracted into the outLvl variable. The select attribute of the xsl:variable instruction finds a w:style paragraph style node (identified with w:type equal to paragraph) with style name equal to our paragraph style, and extracts the w:val attribute of its w:pPr/w:outlineLvl child. If that child doesn’t exist, the outLvl variable remains empty.
- The elName variable is set to hx if the outLvl variable is set, or to p (regular paragraph) otherwise.
- An output element is generated with the xsl:element instruction (replacing the <P> tag from the previous transformation) using elName as its name. Note that the $elName variable has to be placed in braces ({}) to force variable substitution.
- The rest of the transformation has already been explained in Listing 4; the additional xsl:when instruction skips the generation of class tags for HTML headings.