Converting XML to Relational Data for Use in DB2
- 11.1 Advantages and Disadvantages of Shredding
- 11.2 Shredding with the XMLTABLE Function
- 11.3 Shredding with Annotated XML Schemas
- 11.4 Summary
This chapter describes methods to convert XML documents to rows in relational tables. This conversion is commonly known as shredding or decomposing of XML documents. Given the rich support for XML columns in DB2 you might wonder in which cases it can still be useful or necessary to convert XML data to relational format. One common reason for shredding is that existing SQL applications might still require access to the data in relational format. For example, legacy applications, packaged business applications, or reporting software do not always understand XML and have fixed relational interfaces. Therefore you might sometimes find it useful to shred all or some of the data values of an incoming XML document into rows and columns of relational tables.
In this chapter you learn:
- The advantages and disadvantages of shredding and of different shredding methods (section 11.1)
- How to shred XML data to relational tables using INSERT statements that contain the XMLTABLE function (section 11.2)
- How to use XML Schema annotations that map and shred XML documents to relational tables (section 11.3)
11.1 Advantages and Disadvantages of Shredding
The concept of XML shredding is illustrated in Figure 11.1. In this example, XML documents with customer name, address, and phone information are mapped to two relational tables. The documents can contain multiple phone elements because there is a one-to-many relationship between customers and phones. Hence, phone numbers are shredded into a separate table. Each repeating element, such as phone, leads to an additional table in the relational target schema. Suppose the customer information can also contain multiple email addresses, multiple accounts, a list of most recent orders, multiple products per order, and other repeating items. The number of tables required in the relational target schema can increase very quickly. Shredding XML into a large number of tables can lead to a complex and unnatural fragmentation of your logical business objects that makes application development difficult and error-prone. Querying the shredded data or reassembling the original documents may require complex multiway joins.
Figure 11.1 Shredding of an XML document
Depending on the complexity, variability, and purpose of your XML documents, shredding may or may not be a good option. Table 11.1 summarizes the pros and cons of shredding XML data to relational tables.
Table 11.1. When Shredding Is and Isn't a Good Option
Shredding Can Be Useful When... |
Shredding Is Not A Good Option When... |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In many XML application scenarios the structure and usage of the XML data does not lend itself to easy and efficient shredding. This is the reason why DB2 supports XML columns that allow you to index and query XML data without conversion. Sometimes you will find that your application requirements can be best met with partial shredding or hybrid XML storage.
- Partial shredding means that only a subset of the elements or attributes from each incoming XML document are shredded into relational tables. This is useful if a relational application does not require all data values from each XML document. In cases where shredding each document entirely is difficult and requires a complex relational target schema, partial shredding can simplify the mapping to the relational schema significantly.
- Hybrid XML storage means that upon insert of an XML document into an XML column, selected element or attribute values are extracted and redundantly stored in relational columns.
If you choose to shred XML documents, entirely or partially, DB2 provides you with a rich set of capabilities to do some or all of the following:
- Perform custom transformations of the data values before insertion into relational columns.
- Shred the same element or attribute value into multiple columns of the same table or different tables.
- Shred multiple different elements or attributes into the same column of a table.
- Specify conditions that govern when certain elements are or are not shredded. For example, shred the address of a customer document only if the country is Canada.
- Validate XML documents with an XML Schema during shredding.
- Store the full XML document along with the shredded data.
DB2 9 for z/OS and DB2 9.x for Linux, UNIX, and Windows support two shredding methods:
- SQL INSERT statements that use the XMLTABLE function. This function navigates into an input document and produces one or multiple relational rows for insert into a relational table.
- Decomposition with an annotated XML Schema. Since an XML Schema defines the structure of XML documents, annotations can be added to the schema to define how elements and attributes are mapped to relational tables.
Table 11.2 and Table 11.3 discuss the advantages and disadvantages of the XMLTABLE method and the annotated schema method.
Table 11.2. Considerations for the XMLTABLE Method
Advantages of the XMLTABLE Method |
Disadvantages of the XMLTABLE Method |
|
|
|
|
|
|
Table 11.3. Considerations for Annotated Schema Decomposition
Advantages of the Annotated Schema Method |
Disadvantages of the Annotated Schema Method |
|
|
|
|
|
|