- Intended Audience
- Deployment Assumptions
- How the Gateway Works
- Concepts of the Rewriter
- Adding and Removing Rewriter Rules
- Methodology for Rule Extraction
- Out-Of-Box Rule Set
- Rewriting HTML Attributes
- Rewriting FORM Tag Input
- Rewriting JavaScript Content
- Rewriting Applet Parameters
- Rewriting Cascading Style Sheets
- Rewriting XML
- Performance
- Order Importance
- CASE Studies: How to Configure the Gateway to Rewrite a Web-Based JavaScript Navigation Bar
- Third Party Application Cookbooks
- Exchange
- How to Get Hot Patches
- Glossary
- Acknowledgements
Rewriting HTML Attributes
Although most needed HTML tag attributes are already added to the Rewrite HTML Attributes section of the gateway profile, there may be instances where new attributes are supported in HTML tags. For rules added in this profile section to be considered during translation, the MIME type for the content should be text/html and the attribute value must be a raw URL. If the attribute value contains JavaScript content instead, then the attribute name should be added to the Rewrite HTML Attributes Containing JavaScript section.
The following is an example:
<A TARGET="content" HREF="iim.jnlp" NAME="CHAT" onMouseOver=document.images[0].src="images/chat2.gif" onMouseOut=document.images[0].src="images/chat.gif";> <IMG ALIGN="MIDDLE" SRC="images/chat.gif" BORDER="0" ALT=" Chat"></A>
In this particular anchor example, there are two tags, A and IMG. The tag attributes are TARGET, HREF, NAME, onMouseOver, onMouseOut, ALIGN, SRC, and ALT. Only HREF, onMouseOver, onMouseOut, and SRC need to be considered for containing potential URLs. The HREF and SRC attributes have already been added to the Rewrite HTML Attributes section of the gateway profile. Their values are both raw URLs, so they will be rewritten correctly.
The two event attributes onMouseOver and onMouseOut have been added to the Rewrite HTML Attributes Containing JavaScript section out-of-box, so their values will attempt to be translated as URLs. This translation will be successful only if a wildcard rule has been added to the Rewrite JavaScript Variables in URLs section of the gateway profile at the SP3 Hot Patch 3 install level.
HTML BASE Tag
It is important to understand the role that the BASE tag plays in how documents are rewritten and what to expect in content that contains a BASE tag. The BASE tag is used by the browser for address completion of relative links. Instead of rewriting the BASE HREF attribute value and leaving the relative URLs alone, the rewriter comments out the BASE tag entirely and rewrites the relative URLs throughout the document by using the translated value of the BASE tag for address completion. The reason for this implementation is that multiple scraped channels can be displayed on the Portal Server desktop and that one uncommented BASE tag would affect any other Portal Server desktop content that might contain its own relative URLs.
Because the Portal Server desktop is essentially an HTML table after it is rendered, there is no way to have multiple BASE tags and have the relative URLs resolved correctly. Similarly, scraped pages that contain CSS content can adversely affect the entire Portal Server desktop if the CSS content contains generalized style definitions for basic HTML elements such as the BODY and TABLE tags.
One other limitation to be aware of is when content contains a BASE tag and an APPLET and/or OBJECT tag that does not contain a CODEBASE attribute. In this particular case, when the BASE tag is commented out, the browser will no longer be able to find the APPLET and/or OBJECT code and/or data because there will not be any prepended path information supplied. In this case, always be sure that a CODEBASE attribute is used for these, and similar tags, when a BASE tag is also used within the same document. The SP4 Hot Patch 1 release handles this case by inserting a CODEBASE attribute if one does not already exist when a BASE tag is present in the document HEAD element. Even though the BASE HREF value can be a fully qualified URL, which includes a resource name, it is recommended to end the HREF value with a directory name and a trailing slash.
The following is an example:
<BASE HREF="http://www.iplanet.com/docs/index.html"> <BASE HREF="http://www.iplanet.com/docs/">
The first instance is a valid BASE tag. The second instance will be sure to resolve relative URLs throughout the remainder of the document correctly. The SP4 Hot Patch 1 release addresses cases in which the BASE tag contains only the host and port information, but no path information, as in the following example:
<BASE HREF="http://www.iplanet.com.80">
Best PracticesHTML Programming for Use Through the Gateway
You should use the following best practices:
Always use CODEBASE attributes for tags that support them, as in the following example:
<APPLET CODEBASE="http://www.iplanet.com/java/" CODE="helloWorld.class">
End BASE HREF attribute URLs with a directory name or a directory name and a following slash, as in the following example:
<BASE HREF="http://www.iplanet.com/docs/">
Avoid fractured HTML where attribute values or tag bodies might be defined on multiple lines, as in the following example:
document.write("<A HREF=\"\n"); document.write("http://www.iplanet.com\n"); document.write("\">link</A>\n");
Try to maintain well-formed HTML where quotes match up and they are the same type.
Avoid nested quotes where possible, and use consistency across tag definitions, as in the following example:
document.write("<IMG SRC='" + theSrc + "' HEIGHT=80 WIDTH='80'>");
NOTE
Here the gateway will blindly rewrite the SRC attribute without knowing the value of theSrc variable. There may be a fix for this by the time you read this guide, so check with Sun ONE support if you experience this problem and are unable to code around it.
Specify URLs with prepended path information whenever possible.
Having prepended path information makes it easier for the gateway to figure out address completion. The following is an example:
<IMG SRC="../../images/button.gif">
Do not use upper case or mixed case protocol identifiers in your URLs, as in the following:
<A HREF="HTTP://content-server.iplanet.com">
Do not attempt to mimic the rewriter behavior by adding the gateway name to the URL prior to passing the content through the gateway.
Try to avoid setting attribute values to null if the attribute name has been added to the Rewrite HTML Attributes list. Prior to SP3 Hot Patch 3, a value of "" would still be rewritten.
The following is an example of what to avoid prior to SP3 Hot Patch 3:
<FRAMESET cols="20%, 80%"> <FRAMESET rows="100, 200"> <FRAME src=""> <FRAME src="test-txt2.html"> </FRAMESET> <FRAME src="test-txt3.html"> </FRAMESET>
NOTE
This is usually done so that JavaScript write methods can later be called to create the actual frame page. If this SRC attribute is rewritten and accessed using the Netscape Navigator browser, a directory listing may be presented, depending on the web server configuration, but the write methods will still execute. With the Internet Explorer browser, if the directory listing is turned off on the web server, an error redirection occurs in the browser, and the JavaScript write methods will no longer work. SP3 Hot Patch 3 fixed this inconsistency by simply not rewriting the null SRC attribute value. If white space occurs between the quotes, it will not be considered a null attribute value any longer and will be rewritten. So, it is important to ensure that the quotes occur directly next to one another to prevent unwanted rewriting from occurring.
Avoid using the STYLE attribute with a background URL in HTML tags, as in the following example:
<BODY STYLE="background-image:url(../../img/background.jpg); background-repeat:repeat;width:770px">
Avoid nesting tags of the same type, which may contain content requiring translation, as in the following example:
<SPAN STYLE="color:blue; font-weight:bold; font-style:italic"> <SPAN> Inside SPAN tag: <BR CLEAR="ALL"> <A HREF="../../img/after.jpg"> <IMG SRC="../../img/after.jpg"> </A> </SPAN> </SPAN> <BR CLEAR="ALL"> Outside SPAN tag:<BR> <A HREF="../../img/after.jpg"> <IMG SRC="../../img/after.jpg"> </A>
NOTE
Prior to SP3 Hot Patch 3, the rewriter would ignore the content between nested SPAN tags.
Avoid using a SCRIPT tag with a language attribute other than JavaScript, as in the following example:
<SCRIPT Language="VBScript">
NOTE
There is currently no functionality in the rewriter to handle any scripted languages other than JavaScript.
Do not pass gzipped HTML through the gateway to be displayed by the client.
This HTML could contain URLs that will not be rewritten because the content is in a compressed format when it passes through the gateway.