Content-Oriented Components
Delivering XML content to users requires off-the-shelf components for authoring, presenting, and delivering human-readable XML content. The principal benefit of XML as a content standard is that content from all sources uses the same infrastructure. Different groups of authors may contribute content, and it is easy to convert data from applications into content for users. Unfortunately, this capability raises the complexity of the authoring task because authors need to merge static content seamlessly from numerous contributors with dynamically generated content from applications. Well-defined processes become critical to coordinating multiple content production paths.
There are three things authors must do to take advantage of XML's benefits for content delivery: (1) format content as XML documents, (2) create layouts for this content, and (3) follow a rigorous production process to ensure uniform quality. Authoring tools, layout tools, and content management components provide the functionality necessary to deliver high-quality, high-appeal content quickly.
Authoring Tools
Much XML content includes static documents produced by human authors. Creating them with a text editor is a slow and error-prone process. Moreover, many documents have primarily static content with select pieces of information extracted dynamically from an application. Authors certainly do not want to have to enter programming-related tags manually to drive this capture process. Document authoring tools offer three primary features for improving productivity.
First, they offer word processing-like interfaces that enable authors to create ad hoc documents. This feature separates the free text information from the layout. Second, they offer a wizard- or form-based interface that authors can use to populate documents conforming to a more detailed DTD or Schema. This feature speeds data entry for content such as customer contact reports. Finally, they include features for specifying placeholders for application content that a content server uses to insert dynamic values at delivery time. This feature requires integration with a runtime content processing engine. While different products offer different mixes of these three features and are evolving quickly, some of the most popular products include Arbortext's Epic Editor, XMLSpy's Document Editor, and SoftQuad's XMetaL.
Layout Tools
XSL is a highly sophisticated layout description language. A large proportion of layout designers with the necessary graphical design background may not have the technical background necessary to create stylesheets by hand. Therefore, they need stylesheet layout tools for automatically configuring different page regions to display different types of elements and specifying the text formatting based on rules such as element type and attribute values. These tools must also be aware of DTDs and Schema so that authors can match detailed layouts with detailed formats.
Typically, layout tools allow designers to work in either concrete or abstract modes. In a concrete mode, the designer has an example XML content document and applies formatting to that document's information. The tool then abstracts this information to generate a generic stylesheet. By using several example content documents, the author can ensure complete coverage of different possible cases. In an abstract mode, the designer applies formatting rules to a DTD or Schema. This approach gives the author more flexibility in dynamically determining how to present information based on its value, but it is not as intuitive. Before selecting a layout tool, you should work with your designers to decide which mode is more important. The best tools allow designers to switch between modes, but they still tend to emphasize one over the other. Leading products include Arbortext's Epic Editor, eXcelon's Stylus Studio, IBM's XSL Stylesheet Editor, Whitehill <xsl> Composer and XMLSpy's XSLT Designer.
Content Management Components
As an organization adopts XML, more and more people become involved in authoring XML content. As with HTML and SGML, managing this content poses a logistical challenge. Moreover, because software will automatically generate many documents, there is the possibility for new challenges and even greater content volume. Figure 5-7 shows the generic architecture of a CMS that can help address this problem. As shown, CMSs typically have several major components.
Figure 5-7 Generic CMS Architecture
Repository. The repository provides a robust and fault-tolerant location for storing XML documents, stylesheets, and DTDs or Schema. It consists of an interface that enables the content management system to store and retrieve information, a manager that controls storage mechanisms, and the storage mechanisms themselves. These mechanisms may include filesystems, relational databases, or object databases.
Version control. The version control subsystem performs two functions. First, it prevents multiple authors from simultaneously making changes in the same content. Second, it maintains a version tree of all content. All requests to store or retrieve content must go through the version control system because it maintains the mapping of logical versions to physical data. CMSs that support WebDAV for version control offer the advantage of easier interoperability with different types of authoring clients.
Authoring workflow. A content management system needs a component to coordinate the contributions and revision process. Typically, this coordination includes the maintenance of an authoring schedule and assignments for each author. It ensures the routing of documents from one author to another based on content dependencies. As published content reflects on the organization, this routing may also include approval and revision workflow.
Content processing. Once authors have submitted content, the CMS may offer a number of processing functions. Foremost among these functions is content indexing. If you commit to managing all your content within a CMS, you rely on the search functions it provides. To perform such searches efficiently, authors need to specify how to index the content. In cases where documents include dynamic information bound to application data, another processing function includes accessing, formating, and distributing this data. Some CMSs offer advanced filtering and transformation processing to create different views of content suitable for different audiences automatically. For example, a filter might specify how to generate a summary view for a particular type of document automatically.
Deployment management. When content becomes ready for consumption, an administrator must release it for distribution. Depending on the channel, managing this deployment may take several forms. For static content intended for Web servers, it might create the directory structure and install it in all the Web server machines. For dynamic content intended for Web servers, it may require a substantial amount of configuration information governing network topology, access control, and performance parameters. In cases where you intend to syndicate your content, the CMS will need additional information to manage this process.
There are a wide variety of CMSs, each with its own target use. Content management is a complicated topic, and the choice of product requires considerable analysis. Factors to consider include the primacy of Web over other channels, the use of non-XML formats, and integration with dynamic data sources. Products to consider include Arbortext's Epic E-Content Engine (no repository), BroadVision Publishing Center, Chrystal's Astoria, Documentum4i, Interwoven TeamSite, OmniMark Technologies'/OmniMark, Red Bridge Interactive's DynaBase, SiberLogic's SiberSafe, and Vignette's Content Suite.