Storage Systems
Once you have acquired the necessary fundamental components, you can create, access, and manipulate data in XML documents. But as with all data, you usually want some form of persistent storage that's more robust than the local filesystem. It is not uncommon for projects using XML to stall while figuring out how to address the storage issue. The confusion stems from the fact that there are three vastly different choices: a database management system (DBMS), a content management system (CMS), or a native XML store. The appropriate choice depends on the characteristics of your XML data.
What if you use XML as a data interchange format? In this case, a source application encodes data from its own native format as XML, and a target application decodes the XML data into its own native format. XML is an intermediate data representation. Both the source and target applications already have persistent storage mechanisms, almost certainly DBMSs of one sort or another. There is really no need to store the XML documents persistently themselves, except perhaps for logging purposes.
In fact, the entire purpose of the interchange format is to combine data from an external source with the rest of the data in the DBMS. If you want to access or search data from these interchange documents along with data already in the DBMS, you need to convert it from XML to the DBMS's native format. You may take this approach even further by making XML the lingua franca among different data sources. The discussion of Data Servers in the Server Infrastructure section that follows addresses this option. But even in this sophisticated case, XML remains an intermediate data format. The data is ultimately translated and stored in an existing DBMS.
What if you use XML as a content format? In this case, authoring tools generate content as XML, and layout tools generate stylesheets for displaying this content. But content production usually requires higher-level features beyond storage, such as collaborative authoring, rendering to different media, and indexing documents. The following discussion of Content Provisioning Components in the Content Components section examines the need for these higher-level features in more detail. Moreover, you may also have in other formats content that you must manage alongside the XML documents.
In this case, you probably want to use a CMS that addresses persistent storage in conjunction with these other needs. Because most commercial CMSs evolved with the use of SGML, vendors have found it fairly easy to add excellent XML support. So if XML is a content and layout representation, use a CMS. CMS products with XML capabilities include BroadVision Publishing Center, Chrystal's Astoria, Documentum4i, Interwoven TeamSite, OmniMark Technologies' OmniMark, Red Bridge Interactive's DynaBase, SiberLogic's SiberSafe, and Vignette's Content Suite. The discussion of Content Management Components in the Content-Related Components section that follows covers the features of CMSs in more depth.
What if you use XML as an operational data format? Operational data is data that directly drives an application or process. Usually, DBMSs maintain operational data, but there are two cases where XML is likely to be the format. In the first case, XML is the format for an important work product of some kind. As discussed in Chapter 4, the business document architecture uses XML in precisely this manner. Each document represents a completed work product exchanged between organizations. Certainly, organizations will break down this document and map certain portions to corresponding DBMSs. However, the XML document is the starting point for driving this downstream processing and the ultimate point of reference for auditing. In the second case, XML is the format for instructions used in executing a process. Chapter 4 also discussed the emerging class of orchestration applications that use an XML format to describe the assembly of software components or the workflow for business processes.
In either of these situations, neither a traditional DBMS or a CMS is appropriate. You need to use the XML document as a single unit but still index its internal contents. Traditional DBMSs do this poorly because they either have to disassemble the document into their internal formats or create special functions for treating documents as Large Objects (LOBs). Traditional CMSs do this poorly because they are not optimized for subsecond response under high request loads. So if XML is an operational data representation, use a native XML store. Such products include Ipedo XML Database, IXIASOFT's TEXTML Server, NeoCore XMS, Software AG's Tamino, and XYZFind Server. The products mentioned in the Data Server discussion of the Server Infrastructure section can also store native XML data and are particularly useful when your application has a combination of native XML and traditional DBMS data.