- 11.1 Advantages and Disadvantages of Shredding
- 11.2 Shredding with the XMLTABLE Function
- 11.3 Shredding with Annotated XML Schemas
- 11.4 Summary
11.2 Shredding with the XMLTABLE Function
The XMLTABLE function is an SQL table function that uses XQuery expressions to create relational rows from an XML input document. For details on the XMLTABLE function, see Chapter 7, Querying XML Data with SQL/XML. In this section we describe how to use the XMLTABLE function in an SQL INSERT statement to perform shredding. We use the shredding scenario in Figure 11.1 as an example.
The first step is to create the relational target tables, if they don't already exist. For the scenario in Figure 11.1 the target tables are defined as follows:
CREATE TABLE address(cid INTEGER, name VARCHAR(30), street VARCHAR(40), city VARCHAR(30)) CREATE TABLE phones(cid INTEGER, phonetype VARCHAR(10), phonenum VARCHAR(20))
Based on the definition of the target tables you construct the INSERT statements that shred incoming XML documents. The INSERT statements have to be of the form INSERT INTO ... SELECT ... FROM ... XMLTABLE, as shown in Figure 11.2. Each XMLTABLE function contains a parameter marker ("?") through which an application can pass the XML document that is to be shredded. SQL typing rules require the parameter marker to be cast to the appropriate data type. The SELECT clause selects columns produced by the XMLTABLE function for insert into the address and phones tables, respectively.
Figure 11.2. Inserting XML element and attribute values into relational columns
INSERT INTO address(cid, name, street, city) SELECT x.custid, x.custname, x.str, x.place FROM XMLTABLE('$i/customerinfo' PASSING CAST(? AS XML) AS "i" COLUMNS custid INTEGER PATH '@Cid', custname VARCHAR(30) PATH 'name', str VARCHAR(40) PATH 'addr/street', place VARCHAR(30) PATH 'addr/city' ) AS x ; INSERT INTO phones(cid, phonetype, phonenum) SELECT x.custid, x.ptype, x.number FROM XMLTABLE('$i/customerinfo/phone' PASSING CAST(? AS XML) AS "i" COLUMNS custid INTEGER PATH '../@Cid', number VARCHAR(15) PATH '.', ptype VARCHAR(10) PATH './@type') AS x ; |
To populate the two target tables as illustrated in Figure 11.1, both INSERT statements have to be executed with the same XML document as input. One approach is that the application issues both INSERT statements in one transaction and binds the same XML document to the parameter markers for both statements. This approach works well but can be optimized, because the same XML document is sent from the client to the server and parsed at the DB2 server twice, once for each INSERT statement. This overhead can be avoided by combining both INSERT statements in a single stored procedure. The application then only makes a single stored procedure call and passes the input document once, regardless of the number of INSERT statements in the stored procedure. Chapter 18, Using XML in Stored Procedures, UDFs, and Triggers, demonstrates such a stored procedure as well as other examples of manipulating XML data in stored procedures and user-defined functions.
Alternatively, the INSERT statements in Figure 11.2 can read a set of input documents from an XML column. Suppose the documents have been loaded into the XML column info of the customer table. Then you need to modify one line in each of the INSERT statements in Figure 11.2 to read the input document from the customer table:
FROM customer, XMLTABLE('$i/customerinfo' PASSING info AS "i"
Loading the input documents into a staging table can be advantageous if you have to shred many documents. The LOAD utility parallelizes the parsing of XML documents, which reduces the time to move the documents into the database. When the documents are stored in an XML column in parsed format, the XMLTABLE function can shred the documents without XML parsing.
The INSERT statements can be enriched with XQuery or SQL functions or joins to tailor the shredding process to specific requirements. Figure 11.3 provides an example. The SELECT clause contains the function RTRIM to remove trailing blanks from the column x.ptype. The row-generating expression of the XMLTABLE function contains a predicate that excludes home phone numbers from being shredded into the target table. The column-generating expression for the phone numbers uses the XQuery function normalize-space, which strips leading and trailing whitespace and replaces each internal sequence of whitespace characters with a single blank character. The statement also performs a join to the lookup table areacodes so that a phone number is inserted into the phones table only if its area code is listed in the areacodes table.
Figure 11.3. Using functions and joins to customize the shredding
INSERT INTO phones(cid, phonetype, phonenum) SELECT x.custid, RTRIM(x.ptype), x.number FROM areacodes a, XMLTABLE('$i/customerinfo/phone[@type != "home"]' PASSING CAST(? AS XML) AS "i" COLUMNS custid INTEGER PATH '../@Cid', number VARCHAR(15) PATH 'normalize-space(.)', ptype VARCHAR(10) PATH './@type') AS x WHERE SUBSTR(x.number,1,3) = a.code; |
11.2.1 Hybrid XML Storage
In many situations the complexity of the XML document structures makes shredding difficult, inefficient, and undesirable. Besides the performance penalty of shredding, scattering the values of an XML document across a large number of tables can make it difficult for an application developer to understand and query the data. To improve XML insert performance and to reduce the number of tables in your database, you may want to store XML documents in a hybrid manner. This approach extracts the values of selected XML elements or attributes and stores them in relational columns alongside the full XML document.
The example in the previous section used two tables, address and phones, as the target tables for shredding the customer documents. You might prefer to use just a single table that contains the customer cid, name, and city values in relational columns and the full XML document with the repeating phone elements and other information in an XML column. You can define the following table:
CREATE TABLE hybrid(cid INTEGER NOT NULL PRIMARY KEY, name VARCHAR(30), city VARCHAR(25), info XML)
Figure 11.4 shows the INSERT statement to populate this table. The XMLTABLE function takes an XML document as input via a parameter marker. The column definitions in the XMLTABLE function produce four columns that match the definition of the target table hybrid. The row-generating expression in the XMLTABLE function is just $i, which produces the full input document. This expression is the input for the column-generating expressions in the COLUMNS clause of the XMLTABLE function. In particular, the column expression '.' returns the full input document as-is and produces the XML column doc for insert into the info column of the target table.
Figure 11.4. Storing an XML document in a hybrid fashion
INSERT INTO hybrid(cid, name, city, info) SELECT x.custid, x.custname, x.city, x.doc FROM XMLTABLE('$i' PASSING CAST(? AS XML) AS "i" COLUMNS custid INTEGER PATH 'customerinfo/@Cid', custname VARCHAR(30) PATH 'customerinfo/name', city VARCHAR(25) PATH 'customerinfo/addr/city', doc XML PATH '.' ) AS x; |
It is currently not possible to define check constraints in DB2 to enforce the integrity between relational columns and values in an XML document in the same row. You can, however, define INSERT and UPDATE triggers on the table to populate the relational columns automatically whenever a document is inserted or updated. Triggers are discussed in Chapter 18, Using XML in Stored Procedures, UDFs, and Triggers.
It can be useful to test such INSERT statements in the DB2 Command Line Processor (CLP). For this purpose you can replace the parameter marker with a literal XML document as shown in Figure 11.5. The literal document is a string that must be enclosed in single quotes and converted to the data type XML with the XMLPARSE function. Alternatively, you can read the input document from the file system with one of the UDFs that were introduced in Chapter 4, Inserting and Retrieving XML Data. The use of a UDF is demonstrated in Figure 11.6.
Figure 11.5. Hybrid insert statement with a literal XML document
INSERT INTO hybrid(cid, name, city, info) SELECT x.custid, x.custname, x.city, x.doc FROM XMLTABLE('$i' PASSING XMLPARSE(document '<customerinfo Cid="1001"> <name>Kathy Smith</name> <addr country="Canada"> <street>25 EastCreek</street> <city>Markham</city> <prov-state>Ontario</prov-state> <pcode-zip>N9C 3T6</pcode-zip> </addr> <phone type="work">905-555-7258</phone> </customerinfo>') AS "i" COLUMNS custid INTEGER PATH 'customerinfo/@Cid', custname VARCHAR(30) PATH 'customerinfo/name', city VARCHAR(25) PATH 'customerinfo/addr/city', doc XML PATH '.' ) AS x; |
Figure 11.6. Hybrid insert statement with a "FromFile" UDF
INSERT INTO hybrid(cid, name, city, info) SELECT x.custid, x.custname, x.city, x.doc FROM XMLTABLE('$i' PASSING XMLPARSE(document blobFromFile('/xml/mydata/cust0037.xml')) AS "i" COLUMNS custid INTEGER PATH 'customerinfo/@Cid', custname VARCHAR(30) PATH 'customerinfo/name', city VARCHAR(25) PATH 'customerinfo/addr/city', doc XML PATH '.' ) AS x; |
The insert logic in Figure 11.4, Figure 11.5, and Figure 11.6 is identical. The only difference is how the input document is provided: via a parameter marker, as a literal string that is enclosed in single quotes, or via a UDF that reads a document from the file system.
11.2.2 Relational Views over XML Data
You can create relational views over XML data using XMLTABLE expressions. This allows you to provide applications with a relational or hybrid view of the XML data without actually storing the data in a relational or hybrid format. This can be useful if you want to avoid the overhead of converting large amounts of XML data to relational format. The basic SELECT ... FROM ... XMLTABLE constructs that were used in the INSERT statements in the previous section can also be used in CREATE VIEW statements.
As an example, suppose you want to create a relational view over the elements of the XML documents in the customer table to expose the customer identifier, name, street, and city values. Figure 11.7 shows the corresponding view definition plus an SQL query against the view.
Figure 11.7. Creating a view over XML data
CREATE VIEW custview(id, name, street, city) AS SELECT x.custid, x.custname, x.str, x.place FROM customer, XMLTABLE('$i/customerinfo' PASSING info AS "i" COLUMNS custid INTEGER PATH '@Cid', custname VARCHAR(30) PATH 'name', str VARCHAR(40) PATH 'addr/street', place VARCHAR(30) PATH 'addr/city' ) AS x; SELECT id, name FROM custview WHERE city = 'Aurora'; ID NAME ----------- ------------------------------ 1003 Robert Shoemaker 1 record(s) selected. |
The query over the view in Figure 11.7 contains an SQL predicate for the city column in the view. The values in the city column come from an XML element in the underlying XML column. You can speed up this query by creating an XML index on /customerinfo/addr/city for the info column of the customer table. DB2 9 for z/OS and DB2 9.7 for Linux, UNIX, and Windows are able to convert the relational predicate city = 'Aurora' into an XML predicate on the underlying XML column so that the XML index can be used. This is not possible in DB2 9.1 and DB2 9.5 for Linux, UNIX, and Windows. In these previous versions of DB2, include the XML column in the view definition and write the search condition as an XML predicate, as in the following query. Otherwise an XML index cannot be used.
SELECT id, name FROM custview WHERE XMLEXISTS('$INFO/customerinfo/addr[city = "Aurora"]')