16.7 Related Work
In this section, we first present related work on XML query languages, which is useful for view definition. Then, we present an overview of XML data integration projects.
16.7.1 Query Languages for XML
Today, there is not yet a W3C standard for an XML query language. However, the W3C has proposed recently a working draft for a future query language standard: XQuery (Boag et al. 2002). XQuery is derived from a query language named Quilt (Chamberlin et al. 2000), which borrowed features from existing query languages. XPath (Clark and DeRose 1999) and XQL (Robie et al. 1998) were used for addressing parts of an XML document. XML-QL (Deutsch, Fernandez, Florescu et al. 1999) was used for its restructuring capabilities. XML-QL is based on pattern matching and uses variables to define a result pattern. We also use the concept of pattern matching for our view specification.
16.7.2 Storing XML Data
Many approaches have been proposed for storing XML data in databases. We presented the main techniques to store XML data in section 16.5.1, "The Different Approaches to Storing XML." Different mapping schemas for relational databases have also been proposed (e.g., Manolescu et al. 2000; Yoshikawa et al. 2001; Sha et al. 1999). In D. Florescu and D. Kossmann, several mappings are compared using performance evaluations (Florescu and Kossmann 1999a). Mappings based on DTDs have also been proposed by J. Shanmugasundaram et al. (Shanmugasundaram et al. 1999). The STORED system has explored a mapping technique using an object-oriented database system (Deutsch, Fernandez, and Suciu 1999). Recently, the LegoDB system (Bohannon et al. 2002) has proposed a mapping technique using adaptive shredding.
16.7.3 Systems for XML Data Integration
Many research projects have focused on XML data integration, given the importance of this topic.
The MIX (Mediation of Information using XMLhttp://www.npaci.edu/DICE/MIX) system was designed for mediation of heterogeneous data sources. The system is based on wrappers to export heterogeneous sources. The work by C. Baru (Baru 1999) deals with relational-to-XML mapping. Views are defined with XMAS (Ludäscher et al. 1999), which was inspired by XML-QL. The language proposes a graphic interface (BBQ) but only considers XML documents that are validated by a DTD. Other documents are not considered. As the approach is virtual, XML data storage has not been considered.
Xyleme is an XML data warehouse system designed to store all data on the Web as data. This ambitious aim underlines interesting issues. XML data acquisition and maintenance is studied in L. Mignet et al. and A. Marian et al. (Mignet et al. 2000; Marian et al. 2000). XML data are stored in a special-purpose DBMS named NATIX (Kanne and Moerkotte 1999), which uses the hybrid approach we described earlier. To provide a unified view of data stored in the warehouse, Xyleme provides an abstract DTD that can be seen as the ontology of a domain. Then a mapping is defined between the DTD of the stored documents (concrete DTD) and the DTD of the domain modeled by the document (abstract DTD) (Reynaud et al. 2001). Compared to our system, Xyleme is aimed at storing all XML documents dealing with a domain, without storage space consideration, while our approach allows us to filter XML data to be stored by a view specification mechanism.
Recently, an original system for optimizing XML data storage has been proposed. The LegoDB (Bohannon et al. 2002) is a cost-based XML storage-mapping engine that explores a space of XML-to-relational mappings and selects the best mapping for a given application. Parameters to find the best mapping are: (1) an extension of the XML Schema containing data statistics on sources and (2) an XQuery workload. LegoDB cannot be considered a complete integration system because it considers only storage and proposes an efficient solution to storing XML data according to an XQuery workload.