Server Infrastructure
With an appropriate place to store XML data persistently, the next concern is distributing and manipulating this data. In modern Web application architectures, servers play critical roles in assembling, processing, and distributing information. Adding XML support to your server infrastructure mostly involves making sure that existing servers are XML-enabled, with perhaps the installation of a few XML-specific components. Most important, you must verify that XML capabilities meet the scalability and reliability demands of all server functions.
In general, there are three types of server components: data servers, application servers, and content servers. Data servers access, aggregate, and format data. Application servers execute business logic components and mediate distributed business processing. Content servers facilitate the acquisition of content, enhance its accessibility to users, and apply formatting. Figure 5-5 shows these different types of servers working together in a typical Web application environment. This type of server web provides the conduit for propagating XML documents within an enterprise and throughout the Internet.
While Figure 5-5 shows each server component as a distinct node, this arrangement isn't necessary. Server software may combine these components in different ways, and, in fact, different combinations lead to distinct product segments. Integration servers combine data server functions to aggregate information from multiple sources with application server functions to control the flow of business processes. Portal servers combine data server functions to access information from multiple sources with content server functions to filter this information based on user requirements. Personalization servers combine application server functions to calculate user needs with content server functions to customize their experiences dynamically. By understanding the roles of the three basic types of server functionality, you can evaluate whether such a combination suits your needs.
Figure 5-5 Types of Server Components
Data Servers
DBMSs inherently constrain the use of data. They have to choose a particular paradigm, such as relational or object. Relational DBMSs with normalized tables optimize the combination data in different ways. Object DBMSs with associated instances optimize the traversal of information webs. Within a given paradigm, each individual database has a particular structure limiting the types of information it can store and the access patterns it supports. DBMSs do a wonderful job of managing data when a given database must support only a few types of applications and when each application relies on only a few databases. However, when a given database must support a wide variety of application types or a given application must rely on many different databases, satisfying these demands often tax DBMSs to their limits. In such cases, an XML-enabled data server can improve flexibility and performance.
XML broadens the use of data. The ability to design special purpose data formats quickly encourages the combination of information managed in different databases. So while data servers have existed for some time, XML's emergence as a solution to information exchange problems has elevated their role. Data servers perform three major functions: (1) they unify the data access interface to simplify application development, (2) they aggregate data from different sources to deliver customized packages of information, and (3) they consolidate requests to DBMSs to improve performance. XML requires special support only in the first two functions. Because optimizing performance through consolidation strategies like data caching and connection pooling occurs internally to the data server, the use of XML as the format does not affect this function.
An XML-enabled data server supports XML as the unified data access format. When an application submits a request to the data server, the data server fulfills it with an XML document. Given the rise of XML messaging discussed in Chapter 4, the data server should probably support this interaction over SOAP, using an interface specified in WSDL. Merely retrieving ad hoc bits of data as XML documents that the application then has to translate into programming data structures doesn't add much benefit. Programmatic solutions such as ODBC and JDBC already satisfy this need. The more substantial benefit comes from defining synthetic XML documents that form customized packages of data suited to a particular purpose.
To deliver a synthetic XML document, the data server must have a mapping between the document type and the structures managed by backend DBMSs. A developer defines an XML DTD or Schema for the document type and then maps fields in the various database schemas to element and attribute types. The developer also defines the keys used to select the correct records for populating a document instance. At runtime, an application submits a request for a synthetic document type and the appropriate keys. The data server then looks up the mapping, constructs queries based on the mapping and the keys, and puts the results into an XML document. This results document is valid with respect to the specified DTD or Schema.
In some cases, a DBMS vendor may include some data server capabilities with its DBMS product. For instance, Oracle9i includes XML mapping capabilities. In cases where the need for a data sever stems from a small set of homogenous databases attempting to serve many different applications, this solution is sufficient. But when the need for a data server stems from a set of applications attempting to aggregate data across heterogeneous databases, you probably need a separate data server product.
Such products include eXcelon's eXtensible Information Server and Versant enJin, both of which are based on object persistence engines. Data servers require many of the capabilities of backend databases to provide high availability and transactional integrity. They use their own persistence engine as a staging area between applications and backend DBMSs. Therefore, most of the native XML store products discussed previously can also operate as XML data servers by adding features for synchronizing with backend databases. In fact, many vendors of these products are finding that this approach drives a substantial percentage of their sales. Conversely, data server products like eXcelon and enJin can operate as native XML stores, so distinctions between the two markets are blurring. When evaluating either type of product's suitability as a data server, focus on the facilities for mapping backend data to XML documents and the efficiency of performance optimization strategies like caching and pooling.
Application Servers
Application servers operate in the middle tier, applying business logic to data, then handing off the results for presentation. In this capacity, they have three primary reasons for working with XML documents.
They may need to accept data as XML documents from data servers.
They may need to provide business results as XML documents to content servers.
They may have to exchange XML-formatted business messages with other application servers.
To support these operations, the application server can supply basic and advanced services.
Basic services include the execution of XML and XSLT processors, as well as a SOAP implementation. Whether it extracts data from XML documents, exchanges XML business documents, or produces XML business results, the application server needs the access and creation capabilities of an XML processor. Because many developers use XSLT for pre- and postdocument processing, support for this standard should be part of the basic package. Interaction with XML-enabled data, application, and content servers almost certainly includes SOAP communication, so an implementation of the protocol is essential.
Theoretically, because an application server can execute any code in a language it supports, providing basic services is simply a matter of downloading XML and XLST processors plus a SOAP implementation, then installing them. Practically, assuring the performance and quality of execution requires the vendor at the very least to certify components for use with the application server and probably include the recommended packages in the product distribution. You want to make sure that the vendor has tested the particular components, can provide estimates of how much throughput these components can handle, and knows how to support their use with its application server. For J2EE application servers, most vendors recommend the Xerces XML processor, the Xalan XSLT processor, and either their own or a particular third-party SOAP implementation. Microsoft has its own XML processor, XSLT processor, and SOAP implementation for its application server products.
Advanced services tend to vary significantly across application servers and evolve rapidly over time. Therefore, it's more appropriate to focus on the categories of advanced services rather than particular instances. Most advanced services are delivered in the form of frameworks. There are abstraction frameworks and task frameworks. Abstraction frameworks give developers more flexibility to make future changes by performing operations at a higher level. Two excellent examples are Sun's Java API for XML Processing (JAXP) and Java API for XML Messaging (JAXM). Both of these frameworks provide high-level APIs for performing specific XML-related operations. By programming to these abstract APIs rather than the concrete APIs of specific components, developers make it possible to switch their XML processor or XML messaging protocol easily.
Task frameworks provide additional functionality for building specific types of applications. Personalization is a good example of a task framework used to produce XML documents for content servers. These types of applications use metadata about user preferences and metadata about content topics to generate customized content. Because XML is a convenient format for both types of metadata, there is the opportunity to deliver a package that greatly simplifies the development of such applications. But perhaps the best XML-related example of such an application is B2B messaging. This type of application touches on a host of issues, from specifying the allowable flows of messages, to generating views of executing processes, to integrating with back-end systems. Providing all this functionality would be difficult for a single application development team. By using XML, vendors can deliver a widely applicable framework that puts such applications within the reach of more organizations. All the major application server vendorsincluding BEA, IBM, Microsoft, Oracle, and Sunprovide their own flavors of both personalization and B2B messaging frameworks.
Content Servers
Content servers combine data from DBMSs, results from business operations, and authored content into presentation formats suitable for different users. XML-based technologies improve every stage of the fulfillment pipeline. At the very end of the pipeline, they enable dynamic layouts that better fit each user's needs. In the middle of the pipeline, they make it easier to connect a user to the exact information he wants. At the beginning of the pipeline, they make it easier to acquire the library of content necessary to satisfy the user base. Most content servers focus on one or two aspects of this pipeline, so implementing a complete XML content strategy may require several types of content servers.
The most common use for XML in content servers is applying dynamic presentation to XML content. This process occurs as described in Chapter 3's discussion of using XSLT to generate pages in XML-based presentation languages such as HTML, VoiceXML, and WML. Based on variables, including the type of client device, the type of content, and the localization settings for the user, the content server selects an XSLT transform and applies it to the XML document. Because most Web servers have programming extensions that support XSLT, you won't need any additional server infrastructure if all you want is dynamic presentation.
Customizing layouts for users is only part of the content delivery equation. Users also need help finding the content that addresses their immediate needs. Traditional search engines suffer from the problem (raised in Chapter 1) of distinguishing between different contexts for the same word. With XML content, a search engine can use the element structure and attribute values to improve search precision. Using an XML-aware search engine helps maximize the benefits of an XML-based content strategy. Usually, employing such a product involves assigning a dedicated server or cluster of servers to perform searches that then refer users to the appropriate content. Such standalone solutions include DocSoft's extend XML and XML Global's GoXML Search. Of course, most of the CMS and native XML store products discussed previously can perform searches on XML document collections, but this approach works only if you store all the content you plan to search in one of these products.
XML-aware search engines leverage metadata at the element and attribute levels. However, metadata can also apply to entire collections of content. The foundation of the Semantic Web is the use of metadata to provide a conceptual map of an entire site or group of sites. Another W3C Recommendation, Resource Definition Framework (RDF), provides a standardized XML vocabulary for describing the types of content offered, the relationships among content, and the conditions under which content might be relevant. Most site creators use an implicit information model in selecting and organizing content. RDF makes it possible to state this model explicitly. The availability of machine-readable models facilitates automated information retrieval, filtering, and visualization capabilities far beyond those of traditional search engines. The Semantic Web is in its early development, and much of the work is in the form of research and open source projects. However, in the near future, RDF may migrate into mainstream content infrastructure. Web servers will offer RDF descriptions. Search engines will use these descriptions as part of the search criteria. Authoring tools will generate these descriptions.
In addition to making it easier to find content, XML also makes it easier to acquire content. Content can come from two sources: You can create it, or you can borrow it. When creating content, the ability of multiple authors to collaborate effectively greatly enhances productivity. Web Distributed Authoring and Versioning (WebDAV), a set of XML-based extensions to HTTP from the IETF, makes it possible for authors to work together to create, enhance, and maintain content. A WebDAV server manages contributions, tracks changes, and enforces permissions. A number of portal servers, including Microsoft's SharePoint Portal Server and Oracle9iAS Portal, use WebDAV to enable the collaborative editing of portal content. Common Web servers such as Apache and IIS also support WebDAV. Any client that speaks the WebDAV protocol can use these servers to collaborate on documents. Such clients include popular content authoring tools such as Adobe Acrobat and Microsoft Office. Taken to an extreme, WebDAV enables the replacement of traditional document management systems with a set of distributed WebDAV-capable servers. Oracle iFS and Xythos's Web File Server use this approach.
It is often more cost effective to borrow content from someone else than to generate it yourself. However, this type of syndication faces two problems. First, it is often difficult to fit third party content into an application because of differences in layout. XML solves this problem by giving both parties a format for exchanging information separate from presentation. The subscriber knows the structure of each publisher's content, so it can use XSLT to integrate content from different sources and apply its preferred layout. There is also the problem of how to negotiate subscriptions, track usage, and update information automatically. Information and Content Exchange (ICE) addresses these issues by providing a standard XML protocol for such interactions between subscribers and publishers. ICE support is available in a wide variety of products that generate and manage content, including Interwoven's OpenSyndicate, Oracle9i, and Vignette's Content Syndication Server.