RSS: Implementation Issues and Recommendations
As I discussed in my previous article, RSS is a powerful new channel for content distribution. In this piece, we'll look at implementation issues for both of the leading RSS standards (RSS 1.0 and RSS 2.0) and consider a step-by-step approach to successfully implement these standards on a web site.
CSS Transformation of RSS Feeds: A Contentious Issue
There has recently been some discussion as to whether to transform RSS feeds using CSS or XSL. While it's possible to do so, the issue still remains open; I strongly recommend that you avoid transforming until this issue has been settled.
Picking Your RSS Flavor
The first question when considering an RSS implementation is what flavor of RSS you want to support. There are currently two leading standards: RSS 1.0 and RSS 2.0.
If integration with the RDF framework is your goal, use RSS 1.0. Otherwise, I recommend using RSS 2.0, which is generally much easier to implement.
Implementing RSS 1.0
Because RSS is an XML language, the first thing in an RSS file is the XML definition:
<?xml version="1.0" encoding="UTF-8"?>
Next, we define this document as a Resource Description Framework (RDF) document:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns="http://purl.org/rss/1.0/"> </rdf:RDF>
The xmlns notation above specifies a particular namespace for any tags starting with the keyword associated with that suffix. For example, any tags starting with <rdf:> should be parsed according to the rules specified in the document located at http://www.w3.org/1999/02/22-rdf-syntax-ns#.
NOTE
You may have noticed that one XML namespace in the example (http://purl.org/rss/1.0) is not followed by a keyword. This namespace is the one for all RSS 1.0 applications, and we want it to be our default namespace. Therefore, it doesn't need to be associated with any other keyword.
After defining the document, the next thing we need to do is define the channel:
<channel rdf:about="http://www.tnl.net/channels/feeds/rss100/main.xml"> </channel>
In this channel definition, we use the rdf:about notation, which specifies that the information following rdf:about refers to the resource that's mentioned between the quotes (" ").
In the channel tag, we then define the title, link, and a description as follows:
<channel rdf:about="http://www.tnl.net/channels/feeds/rss100/main.xml"> <title>TNL.net weblog</title> <link>http://www.tnl.net/blog/</link> <description>TNL.net daily views</description> </channel>
The title tag defines the title of the resource, highlighting the title of the page that this RSS channel is about, and followed by a link tag, which defines the location of this page. It's generally recommended that the title be no longer than 40 characters, with a link no longer than 500 characters. A plain-text description follows, providing a quick summary to describe the channel.
The next section of the channel definition is a table of contents:
<items> <rdf:Seq> <rdf:li rdf:resource="http://tnl.net/blog/entry/True_Innovation:_HP_Lightscribe" /> <rdf:li rdf:resource="http://tnl.net/blog/entry/MacWorld_2004:_What_was_NOT_said" /> <rdf:li rdf:resource="http://tnl.net/blog/entry/2004_Predictions" /> </rdf:Seq> </items>
The table of contents starts with <items></items> tags, which contain a list of items. Similarly, the <rdf:Seq></rdf:Seq> tags specify that the included items are a list of resources that will show up in a sequence. Each item in the sequence is listed, along with the URL of the resource that will be described further in the channel. To ensure backward compatibility with previous RSS implementations, it's recommended that no more than 15 items be listed in the table of contents.
After indicating what our channel is about and what items will be listed in the channel, it's time to provide information about the items themselves:
<item rdf:about="http:// tnl.net/blog/entry/True_Innovation:_HP_Lightscribe" > <title>True Innovation: HP Lightscribe</title> <link>http://tnl.net/blog/entry/True_Innovation:_HP_Lightscribe</link> <description>HP showcases innovation with the introduction of Lightscribe. </description> </item>
An item is defined by the <item></item> tag; however, because RSS 1.0 is an RDF syntax, we need to define what the item is about, hence the rdf:about notation. This is then followed by the title of the item, the link at which the item is located, and the description of the item.
After defining each item, the only task that remains is closing the RDF document, finishing off the RSS 1.0 channel.
TIP
A full RSS 1.0 feed can be seen here.
Implementing RSS 2.0
Despite its name, RSS 2.0 was not released after RSS 1.0. It was actually released around the same time and, as a result, is not backwardly compatible with RSS 1.0. It is, however, backwardly compatible with RSS 0.91 and 0.90. An RSS 2.0 channel starts and ends with the rss tag and includes the version number:
<?xml version="1.0"?> <rss version="2.0"> <channel> </channel> </rss>
One of the key differences between RSS 1.0 and RSS 2.0 is that RSS 2.0 doesn't see the channel as a separate entity from the items but treats it as a superset of the items. As a result, the channel tag begins right after the rss tag and is the next-to-last tag in an RSS 2.0 document.
Following the channel tag are the title, link, and description of that channel, which are displayed as follows:
<title>TNL.net weblog</title> <link>http://www.tnl.net/blog/</link> <description>TNL.net daily views</description>
These are the only required elements at the top level of the channel. However, a number of optional elements can provide more details about the channel. One of those optional elements is the image tag, which was required in RSS 0.91 and RSS 0.92:
<image> <title>TNL.net weblog</title> <url>http://www.tnl.net/assets/images/logos/tnldotnetlogo.gif</url> <link>http://www.tnl.net/blog/</link> <width>125</width> <height>44</height> <description>TNL.net weblog</description> </image>
The variables defined above include the title of the image, its web location (specified in the url tag) its width and height (used to display the image properly in RSS aggregators), and the image's description.
After the top levels of the channel have been defined, each of the items follows, with its title, link, and description. For example, an item entry might look like this:
<item> <title>True Innovation: HP Lightscribe</title> <link>http://tnl.net/blog/entry/True_Innovation:_HP_Lightscribe</link> <description>HP showcases innovation with the introduction of Lightscribe. </description> </item>
This is very similar to the way in which RSS 0.91 and RSS 0.92 implement their item descriptions, but RSS 2.0 provides a number of optional tags that can add great value to your item's data. Two tags in particular are very appreciated by aggregators: guid and pubDate. The guid tag uniquely identifies an item. For example, if a resource is published at a particular URL on one day but is archived at a different URL, the guid provides an aggregator with a way to link into a resource at its constant URL. Similarly, if a resource is edited, RSS aggregators that use the guid tag are smart enough not to redisplay the whole item to a reader. pubDate, which follows RFC 822, can be included in every item and shows up as follows:
<pubDate>Thu, 8 Jan 2004 18:01:18 -0500</pubDate>
TIP
A full RSS 2.0 feed can be seen here.