XML Reference Guide

Mar 14, 2003

␡

⎙ Print

< Back Page 153 of 278 Next >

In this section of the XML and Web Services Guide, we are building a simple RSS feed reader using Ajax. In the previous entry, we created a basic page that uses asynchronous Javascript to load new information such as subcategories for a particular category or feeds for a particular subcategory. The page requests the information using an HTTP request, and then adds it to the page. When we last left our project, we had brought it to the point at which we were requesting the actual RSS feed and displaying it, raw, on the page.

Of course, as just a jumble of text, the information isn't very useful. Instead, we want to take the raw XML and turn it into HTML. Now, you might think that this is a task for XSL Transformations. You'd be right. But we're not going to perform the transformation on the server. Instead, we're going to perform the transformation right in the browser.

Here's how the process is going to work:

First, load the stylesheet when the browser originally loads the page.
Drill down to the feed level.
Download the feed.
Create a DOM Document out of the feed.
Use the stylesheet to transform the Document.
Display the transformed Document on the page.

Let me start by admitting that I'm going to cheat here, just a little. There are something like eight different RSS and RSS-like feed formats out there in the wild, and I could spend a large amount of time talking about the specifics of the actual XSLT stylesheet, but that's not what this entry is about. It's about performing a transformation -- any transformation -- in the browser. So instead, we'll create a simple HTML document using a simple stylesheet that pulls only the most basic of information from the most common of formats. (We'll leave the creation of a more comprehensive solution to as an exercise for the reader.)

Let's start by taking a quick look at two of the most common formats, RSS .9x and RSS 1.0. A sample RSS .91 feed looks something like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="0.91">

<channel>
<title>The Vanguard Science Fiction Report</title>
<link>http://www.vanguardreport.com</link>

<description>The Vanguard Science Fiction Report</description>
<language>en-us</language>

<item>
   <title>Still here...</title>
   <link>http://www.vanguardreport.com/phpnuke/modules.php?name=News&file=rssArticle&sid=857</link>

   <description>No, I haven't abandoned this site, I've just been overwhelmed lately.  (Check out my personal blog if ...</description>
</item>

<item>
   <title>Serenity trailer hits the web</title>
   <link>http://www.vanguardreport.com/phpnuke/modules.php?name=News&file=rssArticle&sid=856</link>

   <description>The trailer for the film version of Firefly, Serenity, is now available on the web.  I'm hoping they ...</description>
</item>
...
</channel>
</rss>

Both basic information about the feed and a set of item elements are contained in the channel element, which is itself contained in the root element. An RSS 1.0 feed is similar, with, among other things, three important differences:

<?xml version="1.0" encoding="iso-8859-1"?>

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  ...
  xmlns="http://purl.org/rss/1.0/">

<channel rdf:about="http://www.chaosmagnet.com/blog/">
   <title>Chaos Magnet</title>
   <link>http://www.chaosmagnet.com/blog/</link>

   <description>The personal and professional ramblings of Nicholas Chase.</description>
...
</channel>

<item rdf:about="http://www.chaosmagnet.com/blog/archives/000649.html">
   <title>Musings on life ... and veterans</title>
   <link>http://www.chaosmagnet.com/blog/archives/000649.html</link>

   <description>It's a weekend for closure after Ray's crossing, and I think I've pretty much settled things in my own head. Let me warn you that this is a long post -- at least for me -- and that unlike most...</description>
   ...
</item>
<item rdf:about="http://www.chaosmagnet.com/blog/archives/000648.html">
   <title>The blog is complete: The Darth Side</title>

   <link>http://www.chaosmagnet.com/blog/archives/000648.html</link>
   <description>I'm still kind of reeling here, trying to finish funeral arrangements, but I took a break and found that The Darth Side: Memoirs of a Monster has come to the end of its run. I don't usually gush about blogs,...</description>
   ...
</item>
...

</rdf:RDF>

In this case, the overall structure is similar, but the three important exceptions are the presence of namespaces, the fact that the root element is RDF instead of rss, and the fact that the item elements are children of the root element and not the channel element. So what we need to do is create an XSLT style sheet that applies to both structures:

<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/*[1]">
  <html>
  <body>
    <h2><xsl:value-of select="*[1]/*[local-name()='title']" /></h2>

    <table>
      <xsl:for-each select="*[local-name()='item']">
      <tr>
        <td>
            <xsl:element name="a">
               <xsl:attribute name="href"><xsl:value-of select="*[local-name()='link']" /></xsl:attribute>

               <xsl:attribute name="target">_blank</xsl:attribute>
               <b><xsl:value-of select="*[local-name()='title']"/></b>
            </xsl:element>
            <br />
            <xsl:value-of select="*[local-name()='description']"/>

         </td>
      </tr>
      </xsl:for-each>
      <xsl:for-each select="*[1]/*[local-name()='item']">
      <tr>
        <td>

            <xsl:element name="a">
               <xsl:attribute name="href"><xsl:value-of select="*[local-name()='link']" /></xsl:attribute>
               <xsl:attribute name="target">_blank</xsl:attribute>
               <b><xsl:value-of select="*[local-name()='title']"/></b>

            </xsl:element>
            <br />
            <xsl:value-of select="*[local-name()='description']"/>
         </td>
      </tr>
      </xsl:for-each>

    </table>
  </body>
  </html>
</xsl:template>
</xsl:stylesheet>

Notice that because we're dealing with potentially different structures, rather than selecting the root element by name, I'm selecting it by position. The asterisk (*) selects all of the child nodes, and the predicate (the part in brackets ([])) indicates the position within the list. So our main template selects the first child of the document root. From there, I'm selecting elements based on their local-name(), which is the same whether we're using namespaces or not.

Finally, I'm displaying any item elements that are children of the channel element or the root element. In any given feed, only one set will be present, so we can use this stylesheet for both structures.

Is this an exhaustive stylesheet for any and all syndicated feeds? Of course not. But that's not what we're here to discuss today. We're here to explain how to run the transformation in the browser.

When last we left our document, we had implemented code that would asynchronously request an HTML file (or any other file, for that matter) and display it on the page. With that in place, it seems natural that if we request any other files, such as an XSLT style sheet, we should probably do it asynchronously. So let's start with that.

Because we only have a single stylesheet to load, it would be silly to load it every time we load a new feed, so let's go ahead and load it asynchronously when we load the page. First we'll create the request:

<script type="text/javascript">

var req;
var styleReq;
var dest;
...
function loadStylesheet(){
  if (window.XMLHttpRequest){
      url = "http://www.nicholaschase.com/ajaxdemo/rss1.xsl";
      styleReq = new XMLHttpRequest();
      styleReq.onreadystatechange = processStylesheetChange;
      styleReq.open("GET", url, true);
      styleReq.send(null);
  } else if (window.ActiveXObject) {
        url = "http://www.nicholaschase.com/ajaxdemo/rss1ie.xsl";
        styleReq = new ActiveXObject("Microsoft.XMLHTTP");
        if (styleReq) {
            styleReq.onreadystatechange = processStylesheetChange;
            styleReq.open("GET", url, true);
            styleReq.send();
        }
   }
}

</script>

</head>
<body onload="loadStylesheet()">

<table width="100%" border="0">
...

The loadStylesheet() function should look familiar, because it's virtually identical to the loadHTML() function we created to load the content in the first place. The differences here are that a) we don't need a destination div, and b) we're not passing in a URL. No, in this case, we're specifically setting the URL for the style sheet within the function, based on which browser we're using. The XSL transformation engine in Internet Explorer doesn't do well with namespaces, so here we have a chance to create a separate style sheet to get around that problem.

In either case, we're creating a new request, styleReq, which loads asynchronously. Because of that, just as we did with the HTML requests, we need an event handler to actually process the data. In this case, it's processStylesheetChange():

var req;
var styleReq;
var stylesheetDoc;
var dest;
...
function processStylesheetChange(){
  if (styleReq.readyState == 4){
    if (styleReq.status == 200){

       if (window.XMLHttpRequest){

           var dp = new DOMParser();
           stylesheetDoc = dp.parseFromString(styleReq.responseText, "text/xml");

       } else if (window.ActiveXObject) {

           stylesheetDoc = new ActiveXObject("Microsoft.XMLDOM");
           stylesheetDoc.async = false;
           stylesheetDoc.loadXML(styleReq.responseText);
       }

    } else {
       alert("Can't load stylesheet:"+styleReq.status);
    }
  }
}
...

Here's where things get interesting. The whole point of this excercise is to use this stylesheet to transform any XML data we load, so we need to get the stylesheet into a DOM Document. In an ideal world, we could simply assign it by requesting responseXML instead of responseText, but that makes the assumption that the target web server is set up to send the proper MIME type for XML files. Unfortunately, many aren't, and that includes some of the largest web hosting companies on the planet. So we get around that by actually parsing the text returned by the styleReq request.

For Mozilla-based browsers, this means using the built-in DOMParser object. First we instantiate it, and then we use it to parse the string data of the request as though it came in as an HTTP response with the MIME type text/xml.

For Internet Explorer, we take a different tactic. First, we create a new XMLDOM ActiveX object. Because the data is already present, we'll make our lives easier by performing the parsing synchronously. From there, we simply load the XML text.

Now we have the style sheet in a DOM Document, ready for use when we load a feed. Let's look at how to actually use it:

...
function processStateChange(){
  statusDiv = document.getElementById("status");
...
  if (req.readyState == 4){
    if (req.status == 200){
       response = req.responseText;

       if (dest == "feed"){

          if (window.XMLHttpRequest){

              var parser = new DOMParser();
              theDocument = parser.parseFromString(req.responseText, "text/xml");

              var xsltProcessor = new XSLTProcessor();
              xsltProcessor.importStylesheet(stylesheetDoc);
              response = xsltProcessor.transformToFragment(theDocument, document);

              destinationDiv = document.getElementById(dest);
              destinationDiv.innerHTML = "";
              destinationDiv.appendChild(response);


          } else if (window.ActiveXObject) {

...
          }

       } else {

          destinationDiv = document.getElementById(dest);
          destinationDiv.innerHTML = response;
       }
    } else {
       statusDiv.innerHTML = "Error: Status "+req.status;
    }
  }
}
...

First off, we'll check to see whether we even need to perform the transformation. We'll know that by the destination of our content; only an RSS feed goes in the feed div. From there, it's a simple matter of performing the transformation and adding the results to the feed div.

It probably comes as no surprise that the way in which we accomplish that depends on the browser we're using. For Mozilla, we'll first make a DOM Document out of the actual content, using the DOMParser, as we did with the style sheet. Next, we'll create a new XSLTProcessor object and import the style sheet it should use for any transformations it performs. Next, we perform the actual transformation.

In this case, we're using the transformToFragment() function, passing in the node to transform (theDocument) and the owner Document for the resulting DocumentFragment object. (Remember, nodes don't just float out there in the ether; they need to have a parent Document, even if they aren't actually attached to it in a specific location.) Mozilla's XSLTProcessor also enables you to transformToDocument(), replacing the actual page.

Once we have the transformed DocumentFragment, we're ready to add it to the page. To do that, we'll get a reference to the feed div, clear its contents, and then append the actual fragment (and thus, all of its children) to the div.

The overall process -- create Document, transform, add to the page -- is the same for Internet Explorer, but we'll handle it a little differently:

...
function processStateChange(){
  statusDiv = document.getElementById("status");
...
  if (req.readyState == 4){
    if (req.status == 200){
       response = req.responseText;

       if (dest == "feed"){

          if (window.XMLHttpRequest){
...
          } else if (window.ActiveXObject) {

              var theDocument = new ActiveXObject("Microsoft.XMLDOM");
              theDocument.async = false;
              theDocument.loadXML(req.responseText);

              destinationDiv = document.getElementById(dest);
              destinationDiv.innerHTML = theDocument.transformNode(stylesheetDoc);

         }

       } else {

          destinationDiv = document.getElementById(dest);
          destinationDiv.innerHTML = response;
       }
    } else {
       statusDiv.innerHTML = "Error: Status "+req.status;
    }
  }
}
...

As before, with the style sheet, we'll create the Document as a Microsoft.XMLDOM object, loading it with the text of the response. In this case, however, we don't need to create an XSLTProcessor; the ability to transform a node based on a stylesheet is built-in to the Document, and the transformNode() function returns the transformed text, making it simple to add it ot the page.

The result is a page that displays the transformed XML, ready to be clicked:

< Back Page 153 of 278 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address