XML-DB2 Guidelines
Consider the following guidelines as you embark on using XML with your DB2 databases.
Learn All You Can About XML
Before you begin to mix XML and DB2, be sure that you have a solid grasp of XML. The short introduction in this chapter is merely the tip of the iceberg. You must understand that XML is hierarchical and, as such, cannot match up exactly with your relational, DB2 way of thinking and processing data.
For in-depth coverage of pureXML support in DB2 for z/OS, refer to the IBM DB2 for z/OS pureXML Guide (SC19-2981) and the IBM RedBook Extremely pureXML in DB2 10 for z/OS (SG24-7915).
Consider augmenting the information in the pureXML manual with additional sources. The DB2 for z/OS pureXML section of IBM developerWorks® website at http://www.ibm.com/developerworks/wikis/display/db2xml/DB2+for+zOS+pureXML contains many useful examples and articles. Another useful reference is the IBM Press book titled DB2 pureXML Cookbook, which covers pureXML for both DB2 for LUW and DB2 for z/OS. You can find additional information on XML at the following websites:
http://www.oasis-open.org http://www.w3schools.com http://www.xml.org
Find XMLEXISTS Predicates for Indexing
Because XML indexes are used only on the XMLEXISTS predicate, it is a good idea to find the predicates within XMLEXISTS clauses before doing any XML indexing. Look for predicates such as [@id = xxx] or [price > 100.00].
Favor Creating Lean XML Indexes
Assume your queries often search for customer documents by last_name. In that case, an index on the last_name element can improve the performance of such queries, for example:
CREATE INDEX CUSTLNX1 ON CUST(XMLCUST) generate key using xmlpattern '/customer/custname/last_name' as sql varchar(20);
Use Caution Before Indexing Everything
As a general rule of thumb, avoid indexing everything (also known as a heavy index) because it is costly to maintain during INSERT, UPDATE, and DELETE processing. An additional concern is that a heavy index requires a lot of storage, which might be better used for more targeted indexes. For example, consider the following heavy index:
CREATE INDEX HEAVYIX ON CUST(XMLCUST) generate key using xmlpattern '//*' as sql varchar(100);
When using xmlpattern "//*" to create an XML index, the generated index key value could contain entries from every text node in every XML document in the XML column. Due to the creation and maintenance overhead, avoid such heavy indexes.
An exception to avoiding heavy indexes might be made for applications with low write activity and an unpredictable query workload making specific indexes hard to anticipate and define.
Favor XPath Expressions with Fully Specified Paths
Avoid using * and // in your path expressions; instead, use fully specified paths whenever possible. For example, assume that you need to retrieve customers’ ZIP codes. There are multiple path expressions you could code to get to the appropriate data. Both /customer/addr/zip_code and /customer/*/zip_code return the ZIP code. But for optimal performance, the fully specified path should be preferred over using * or // because it enables DB2 to navigate directly to the wanted elements, skipping over non-relevant parts of the document.
Use RUNSTATS to Gather Statistics on XML Data and Indexes
The RUNSTATS utility has been extended to collect statistics on XML data and XML indexes. The DB2 Optimizer uses these statistics to generate efficient execution plans for SQL/XML queries. Thus, continue to use RUNSTATS as you would for relational data. Simply stated, DB2 generates better access plans if XML statistics are available.
Use CHECK DATA
Consider running the CHECK DATA utility periodically to check the consistency between the XML document data and its associated XML schema and its XML index data.
Use REPORT TABLESPACE SET
Use the REPORT TABLESPACE SET utility to identify the underlying XML objects that are automatically created.
Consider Deferring the Creation of XML Table Spaces
As of DB2 V10 you can defer the actual physical creation of XML table spaces and their associated indexes to optimize your space management requirements.
By specifying DEFINE(NO), the underlying VSAM data sets are not created until the first INSERT or LOAD operation. The undefined XML table spaces and dependent index spaces are registered in the DB2 Catalog but are considered empty when access is attempted before data is inserted or loaded.
DSNZPARMS: XMLVALA and XMLVALS
The XMLVALA subsystem parameter specifies an upper limit for the amount of storage that each user is to have available for storing XML values. The default is 200 MB. DB2 performs streaming, so you might be able to insert and select XML documents larger than the limit. However, it is a good idea to check the value and set an appropriate value based on your expected XML processing needs.
If you construct XML documents, set XMLVALA to at least twice the maximum length of documents generated. If you query XML data, set XMLVALA at least four times the maximum document size.
XMLVALS is the virtual storage limit allowed for XML processing for the DB2 subsystem. The default value is 10 GB.