- Intended Audience
- Deployment Assumptions
- How the Gateway Works
- Concepts of the Rewriter
- Adding and Removing Rewriter Rules
- Methodology for Rule Extraction
- Out-Of-Box Rule Set
- Rewriting HTML Attributes
- Rewriting FORM Tag Input
- Rewriting JavaScript Content
- Rewriting Applet Parameters
- Rewriting Cascading Style Sheets
- Rewriting XML
- Performance
- Order Importance
- CASE Studies: How to Configure the Gateway to Rewrite a Web-Based JavaScript Navigation Bar
- Third Party Application Cookbooks
- Exchange
- How to Get Hot Patches
- Glossary
- Acknowledgements
Concepts of the Rewriter
The flexibility of the netlet and the uniqueness of the rewriter is what differentiates the Sun ONE Portal Server from most, if not all, other Portal Server offerings. Understanding how the rewriter works and what necessitates rule configuration is essential to a successful and secure Portal Server deployment. Experimenting with the rewriter before the Portal Server is moved into production will also reduce last minute problems that could be avoided.
Rewriter Verses Browser
Because the rewriter is one of the most highly used and sought after features in the Portal Server, it is constantly undergoing modifications and enhancements. Many Portal Server administrators found out that this content that worked in the SP2, no longer worked in SP3, and newer, versions. One major reason for this difference in behavior was the administrator's reliance (knowingly or not) on the browser to handle the resolution of relative URLs by using the location field as a BASE tag equivalent when the page was rendered.
This was more evident where the gateway component had been deployed as employee-facing secure remote access to an existing Intranet portal. In this particular case, an employee would log in to the gateway and be redirected, not to the Portal Server Desktop, but to their own home page or corporate Intranet portal. The location field in the URL would still have the gateway address prepended to it, and the Portal Server session would remain active, as long as requests continued to be made through the gateway. Relative URLs in the redirected page would be resolved by the location field as long as there was not a BASE tag present in the document.
Where this browser relative path resolution took place actually represented inadequacies in the rewriter itself. It meant that there was content that the rewriter was missing or not interpreting correctly. This was usually acceptable in cases where the browser would be able to help out, but prior to SP3, there were large gaps in rewritten content, such as in the case of imported JavaScript or imported CSS. For instance, in SP2, the SRC attribute in the SCRIPT tag would be rewritten so that the browser would be able to correctly retrieve the JavaScript content, but the JavaScript content itself was not rewritten according to the rules specified in the gateway profile. While some people were seeing favorable results where the browser would handle relative URLs, other people had problems where variables in imported JavaScript content were not being rewritten correctly.
Administrators who deployed the gateway in front of the Sun ONE Portal Server desktop in SP2 found that in SP3, with the new rewriter functionalities, the browser location field no longer could be used for resolution of relative URLs. The reason is that the URLs on the Portal Server desktop are derived from scraped channels or custom providers and cannot be resolved using the document location that contains the gateway URL, server URL, and DesktopServlet in the path.
For instance, consider the following URL:
https://ips-gateway/http://ips-server:8080/DesktopServlet
This URL would not work for a relative URL of ../ for a page scraped from an internal site, other than ips-server because the incorrect server would be referenced by the relative path resolution of the browser.
Relative URLs, which are not rewritten to absolute URLs, should be avoided for the following reasons:
The Portal Server wants to be sure that requests to get internal content will always come back to the gateway component.
Absolute URLs are used to determine if the URLs need to be rewritten based on their domain and/or subdomain value.
Absolute URLs are compared to the profile of the user to see whether or not they have permission to retrieve the specified content.
Absolute URLs avoid situations where the browser may resolve a relative path to the incorrect fully qualified path.
Gateway Profile
The gateway profile is a component stored in LDAP with attributes and attribute values used by the gateway for initialization and runtime purposes. Contained in the gateway profile are attributes that are used specifically by the rewriter to determine what content, if any, should be rewritten. To look at the contents of the gateway profile, you can use the ipsadmin command to dump the component into an XML file to view it. Otherwise, you can access the Administration Console, select Gateway Management and Manage Gateway Profile to view the gateway profile. Both of these methods are explained in detail in "Adding and Removing Rewriter Rules" on page 8.
While the gateway profile contains entries for a variety of aspects related to the gateway operation, only those fields and/or attributes directly related to the rewriter will be discussed. Two settings to be aware of that control the overall behavior of the rewriter are the Rewrite All URLs Enabled checkbox and the DNS Domains and SubDomains list. These settings are not mutually exclusive as the checkbox will override any entries for Domains and SubDomains if checked. The remainder of the document assumes that Rewrite All URLs Enabled is checked.
It is worth noting that if Rewrite All URLs Enabled is not checked, any content you wish to have rewritten must have its server domain and subdomain (if one exists) entered in the DNS Domains and SubDomains list. If there is not a subdomain associated with the domain, be sure to put a vertical bar after the domain name. For instance, if you wanted to rewrite every URL that contains iplanet.com, you would add an iplanet.com| entry to the DNS Domains and SubDomains list. If you want to rewrite only certain subdomains within that domain, you would add an iplanet.com|internal entry (internal, in this case, signifies a fictitious subdomain name).
Rule Interpretation by the Gateway
Rules are a means of informing the gateway how to determine if content that passes through it contains a URL that needs to be rewritten. The rules are either strings, or, in some cases, strings containing wildcards, that are used as regular expressions. Each rewriter attribute and/or field in the gateway profile has an associated environment where the rule will apply, and a top down order in which rules will be compared one by one to content within that document or document section to see if there are any URLs which require translation.
One way an environment can be determined is by an HTML tag such as the SCRIPT tag for JavaScript code or the STYLE tag in the case of CSS. Within each environment, there are different data constructs that may require differing syntactical interpretation by the gateway. The JavaScript language, for instance, contains functions that can take function parameters. If one of these function parameters happened to be a URL, a rule would have to be added to the gateway Profile under the Rewrite JavaScript Function Parameters section that would determine the function name and parameter that requires rewriting.
Environments can also be determined by MIME types so that when the content is retrieved by the gateway, it is compared to the appropriate subset of rules and rule values. Imported JavaScript code would contain a MIME type of application/x-javascript extracted from the browser GET request so that when the gateway retrieves the content, it knows what the environment used to rewrite it.