Conclusion
This chapter has presented a variety of concepts pertaining to mapping between XML and relational data. Facilities for accomplishing this mapping are provided by the database vendors and by third-party software manufacturers (either XML specialty vendors or XML-based middleware vendors); in some cases they may be written directly in stored procedures or applications. Vendors, especially database vendors, will also provide facilities for executing XQuery, WebDAV, or other XML processing models directly on top of the relational data model, bypassing translation at the data level.
Which actual features are required for mapping between XML and relational data depends on both the application and on the (possibly evolving) XML schema of the XML data that needs to be supported. The materials in this chapter can be used to assess those requirements and compare them against the capabilities available from various sources.
We have looked at two fundamentally different approaches to storing XML data in relational databases and examined to some degree the approaches used to implement applications on each representation. If we look at both of these representations, we can see some clear contrasts in their applicability, particularly in two areas: support for a wide variety of XML, and support for application types.
LOB representation has some clear advantages in certain situations. If an application requires only a logically accurate view of the data contained in an XML document, in elements and attributes, then both representations are equally applicable. But other applications may require preservation of XML comments, XML processing instructions, or exact preservation of whitespace—or, in general, exact reproduction of an initial XML document. These requirements complicate a composed representation considerably, beyond the capabilities of most commercial applications, whereas they are quite simple for LOB representations.
Certain other kinds of XML structures are possible under composed representations, but may have limited support; these include XML Schema concepts such as substitution groups, recursive schemas, and mixed-content XML (mixed content is easily represented in a composed representation, but may not be so easily queryable). Again, none of these are at all difficult for LOB representations, though they may complicate indexing of LOBs.
Finally, LOB is really the only alternative in situations where the schema of the XML is not known or may evolve in an ad-hoc manner. Because composed representations rely on a fixed relational schema, they are not easily adaptable to changing XML schemas. We did see one example of support for a limited kind of variability (an overflow field to hold any data matching an xs:any portion of a schema), but in general composed representations are unable to query or access any XML structure for which they have not been prepared in advance.
So LOBs can represent a wider variety of XML, which is important if the application must handle source XML data that uses any of these features. When it comes to application types, we also have differences.
The most basic form of application, emitting an XML document, is trivial for an LOB representation. The composed representation requires generating potentially complex queries to reproduce output. As a result, the performance for composed representations suffers with document size. Whether or not performance becomes an issue obviously depends on the complexity of the XML documents.
All the above might make it sound like the LOB representation is the obvious choice. However, most complex query operations—selection, extraction, and recombination of data—clearly favor the composed representation. Even with indexes, processing of LOBs can easily require scanning all the data, which cancels out much of the power of the relational database. While the algorithms to support complex query or update operations are themselves quite complex, they can make effective use of the relational database power to make the actual computation efficient.
Update applications show a similar pattern, with the added proviso that update is really only feasible for certain composed representations that permit “reversible” mappings. Update for LOBs is simple in principle, but will often require complete scans, and will always require constructing complete new data values, both of which hamper performance.
It is clear that no single approach, whether LOB or any of the variety of composed representations, is best for all situations. As support for XML data in databases continues to evolve, we can expect improved support for and performance of both representations, and fewer differences between them.