Transforming XML Documents
Now that we've covered parsing, generating, and searching an XML document, we get into another very important part of XML processingXML transformation. You can transform, or in other words convert, an XML document to another format by applying various transformation techniques. You can transform the XML document to either another XML document, which might be a subset of the original document by filtering certain info, or you can transform to a whole different data format, like HTML, eXtensible HyperText Markup Language (XHTML), Wireless Markup Language (WML), CSV, and so forth.
Why would you want to convert our neatly formatted XML document to another format? We can read and understand the data as-is. Well the answer is, although as a technically savvy individual you took the time to learn XML and are now able to read it, most people would prefer their data represented in a user-friendly format.
One of the most popular methods of transforming XML to another format is to use the eXtensible Stylesheet Language for Transformations (XSLT) standard. An XSLT processor provides the interface needed to transform XML data. One of the processor's components is the XML parser itself, which is needed for the XSLT processor to understand and interact with XML. This makes sense if you think about itthe XSLT processor needs to parse the document to extract the information contained in the XML before it can format the contents. The processor requires an XSLT stylesheet to perform the transformation. An XSLT stylesheet is itself an XML document that contains embedded rules used to perform the transformation. An XSLT processor uses XPath and the rules contained in the XSLT stylesheet to transform the document. It searches through the XML data using XPath and tries to find the desired elements.
NOTE
We will come back to XML transformations and cover them in greater detail in Chapter 8, "XML Transformation and Filtering."
One of the most common (and easily demonstrated) transformations in use today is from XML to HTML. While the data is transferred and processed in a nicely formatted XML document, the XML document must be transformed into HTML to display in the browser with any type of formatting.
We don't want you to have to wait until Chapter 8 to see an example using an XSLT processor, so let's take a look at a simple example of transforming an XML document to HTML. To provide a common starting point, we'll reuse the XML document we discussed earlier, and it is presented here again as Listing 2.3 for your convenience. For this example, let's say that your task is to convert an XML report into HTML, so it can be displayed on a web page. Listing 2.3 shows the XML document we would like to convert and display in HTML.
Listing 2.3 Career statistics for two Hall of Fame baseball players stored in an XML document. (Filename: ch2_baseball__stats.xml)
<?xml version="1.0"?> <career_statistics> <player> <name>Mickey Mantle</name> <team>NY Yankees</team> <home_runs>536</home_runs> <batting_average>.298</batting_average> </player> <player> <name>Joe DiMaggio</name> <team>NY Yankees</team> <home_runs>361</home_runs> <batting_average>.325</batting_average> </player> </career_statistics>
If you recall, the previous XML file contains career statistics for two of the greatest baseball players ever to play the game. The data for each player includes the following elements: <name>, <team>, number of <home runs>, and career <batting average>.
As mentioned earlier, to perform the transformation, the XSLT processor requires two inputs: the XML document and an XSLT stylesheet. The XSLT stylesheet contains formatting information that determines both the content and format of the output HTML file. For example, if you wanted the text to be in a certain color or font, or if you want the information to be displayed in a table with particular column headings, this information would appear in the stylesheet. A sample XSLT stylesheet is shown in Listing 2.4.
Listing 2.4 XSLT stylesheet used to generate an output HTML file. (Filename: ch2_baseball_stats.xslt)
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl=http://www.w3.org/1999/XSL/Transform version="1.0"> <xsl:output omit-xml-declaration="yes"/> <xsl:template match="career_statistics"> <html> <body> <h2>Baseball Players</h2> <table border="1"> <xsl:for-each select="player"> <tr> <td><xsl:value-of select="name"/></td> <td><xsl:value-of select="team"/></td> <td><xsl:value-of select="home_runs"/></td> <td><xsl:value-of select="batting_average"/></td> </tr> </xsl:for-each> </table> </body> </html> </xsl:template> </xsl:stylesheet>
When we apply our custom XSLT stylesheet to the original XML document, an HTML file is generated that contains the data from the XML file, presented in a customized view. Remember, XML contains only data with no information about the formatting, so we're free to format it as we see fit. That's one of the best features of XMLyou can use several XSLT stylesheets to transform the same XML source document into different output HTML files. For example, let's say that your company's annual report is stored in XML and that you are responsible for generating different HTML versions of it. One version of the report would be used for internal purposes, and the other version would appear on the public web site. The internal version may contain proprietary information that shouldn't be displayed on the public web site. In order to generate different HTML versions of the annual report from a single XML document, just define one stylesheet for each target report. The XSLT stylesheet that generates the public version can easily filter out any proprietary information.
After applying the stylesheet in Listing 2.4 to the XML file, the XSLT processor generated the HTML output shown in Listing 2.5. Don't worry about the underlying transformation technology at this point; we just wanted to illustrate the capability of transforming XML into other formats. This is covered in detail in Chapter 8, "XML Transformation and Filtering."
Listing 2.5 HTML document generated by using Perl and XSLT. (Filename: ch2_baseball_stats.html)
<html> <body> <h2>Baseball Players</h2> <table border="1"> <tr> <td>Name</td> <td>Teams</td> <td>Home Runs</td> <td>Batting Average</td> </tr> <tr> <td>Mickey Mantle</td> <td>NY Yankees</td> <td>536</td> <td>.298</td> </tr> <tr> <td>Joe DiMaggio</td> <td>NY Yankees</td> <td>361</td> <td>.325</td> </tr> </table> </body> </html>
Now that we have transformed the original file to HTML, it can easily be displayed in a web browser as shown in Figure 2.5. As you can see, the end result is a nicely formatted HTML table.
Figure 2.5 HTML document generated by an XSLT transformation.