www4mail Technicalities — A Road Map, Statistics, and Future Developments
The www4mail software is written in modular Perl because it has good string processing, pattern matching, and pattern replacement functions. The www4mail Perl code has undergone three major public releases, each of which are described in general.
Version 2.0: The first public release dated December 1998, and depended greatly on the Lynx text mode browser. In fact, it required patching Lynx 2.8.1 in order to support cookies when working in dump mode. The www4mail Perl script was run via sendmail or some other Mail Transport Agent (MTA) and was quite large. Because it was not very memory efficient, it quickly faced problems as soon as the user base grew. In addition, the string processing functions were not very strong. Sometimes, for example, unusual characters such as regrex meta characters and patterns appear, break the software, and generate an error report to the user.
Versions 2.2 - 2.4.x: Released in May of 1999. The 2.2 series marked the appearance of an internal browser function. www4mail depended less on Lynx for http requests but still used Lynx for ftp and other non-http requests. This version also introduced a private spooling system for www4mail processes along with an AUTOLOAD routine for loading sub-routines on demand. This autoloader was simple and based on the Perl generic autoloader routines. This basically improved memory usage, but was not optimal. There were issues with mail loops in the earlier versions, which were resolved in later releases. A key signpost included the introduction of multi-lingual user interfaces.
Version 3.0: Release Candidates for Version 3.0 were made available in May of 2001. This series marked a rewrite of the www4mail code, which is cleaner and more portable. This version presents a completely new architecture with emphasis on easily adding new functionality to the www4mail server.
Version 3.0
The new Version 3.0 architecture is an integration of dynamic module (subroutine) loading and unloading on demand and the ability to register one or more module(s) as plug-in(s) to another module. In order to archive this new architecture, a novel, more robust autoloader was developed with the following three characteristics:
The ability to automatically load Perl sub-routines (modules) from external files.
The ability to evaluate certain test conditions (as required by modules) before loading modules.
The ability to run other modules and plug-ins immediately before or after a module. Essentially this means that before running a subroutine, www4mail can evaluate certain test conditions that are required for the correct functioning of the module. If the tests fail, the module is never loaded. This procedure has drastically improved the memory usage of the server process, as only the necessary modules required to fulfill a request will ever get loaded into memory.
The third characteristic mentioned here, allows the server to natively know just a few modules and all other modules and routines are called based on their association with the native modules. The core and native modules are a reflection of the various processing stages required for each Email received. These modules are named as follows:
load_modules_list: Loads and defines a meta information block for all available modules. Each meta block has information about the source file on disk, the various tests required by the module, the list of plug-ins attached before and after a module, any alternative routines to this module, and any flags (indicating whether the module is to be reloaded from file each time it is called or if the module should only be run once during program runtime).
init: Performs the initial initialization, which usually means defining suitable default values for most variables.
cmdline: Processes the command line arguments.
lireadmail: Collects the user's email from STDIN or from a local spooled file.
maincfg: Loads the server configuration files.
userauth: Authenticates the user's email address along with a preliminary quota check.
daemon: Breaks the connection with the parent sendmail process, if so configured.
parsemail: Checks the user's email and applies (if necessary) MIME or other suitable decoders.
globalcmd: Checks for and processes the global commands.
process: Performs a loop that processes the user's requested URLs one at a time.
endrun: Performs actions before closing, such as updating the error log file, cleaning up temporary files, and exiting.
It takes an average of one to two seconds to go from the init stage to the process stage during runtime; all modules and plug-ins here are only run once during the lifetime of each run. The process stage basically loops around the following items:
urlauth: Checks the local ACL files to see if this URL is permitted.
urlduplicate: Checks if the URL was previously processed within the session. Some users send their requests in a MIME multipart with both a text section and an HTML section.
urlspool: Checks to see if the computer is currently busy or if the configuration demands that the request be queued.
browser: Responsible for downloading the HTML page or binary file from a remote server. For HTML pages, additional tasks include removing line breaks from within tags and also including external scripts, style sheets, frames, and layers. This module and its plug-ins may take anything from two seconds to several minutes as dictated by factors such as network speed and the number of external scripts, style sheets, frames, and so on.
fileauth: Checks the resulting file based on local ACL configurations. At this stage, it can deny the request based on the file extensions, MIME type of file, appearance of one or more keywords or phrases, and even the pics rating system. When www4mail encounters a binary file, as a result of an http request, it will run the ACL check on the main page of the site. This generally involves one or more calls to the browser module.
filter: Applies the necessary HTML code transformations and generates the user's desired output.
reply: Codes the data (file) and mails it to the end user.