8.5 Mapping to Hash Tables
Another interesting mapping is to map XML documents to hash tables. Frequently, XML is used as the format of software configuration files. For example, Tomcat, the servlet container we explain in Chapter 10, uses a number of configuration files (such as server.xml) in XML. You may think that many configuration files are simple enough so that flat text files are good enough. However, in our experience, we know that configuration files keep growing as program development continues. At some point we are forced to segment a configuration file into several logical sections. We must also define a basic syntax of delimiter characters and escape characters. We will also face deciding what international character encoding to use. Considering all this, it is worth using XML for configuration files in the first place.
Let us consider a configuration file (see Listing 8.12) of a hypothetical application program for generating sales reports.
Listing 8.12 Configuration file, chap08/hashtable/config.xml
<?xml version="1.0" encoding="UTF-8"?> <config> <productData locationType="file"> <defaultFileName>c:/productMarketing/products.xml </defaultFileName> </productData> <customerData locationType="db"> <databaseURL>jdbc:db2:sales.ibm.com/customer</databaseURL> <userId>maruyama</userId> <passWord>montelac</passWord> </customerData> <reportFormat locationType="file"> <defaultFileName>c:/CEO/monthlyReport.xml</defaultFileName> <reportTo>M. Murata</reportTo> <reportTo>K. Kosaka</reportTo> </reportFormat> </config>
How can our application program access this configuration file? It is not likely that we want to scan the entire file and do some specific task (for example, summing numbers). Instead, each piece of data in the configuration file will be accessed whenever a specific component of our application needs that particular piece of data, using the name of the configuration parameter as the key. Therefore, a hash table with configuration parameters as its keys is a natural choice of data structure to hold the configuration data. If our program needs the value of a con-figuration parameter called defaultFileName, we can efficiently look up the hash table with this name as the key. In XML-based configuration files, parameter names can be expressed as path expressions, such as /config/productData/ defaultFileName.3
Our configuration file can be expressed as the following hash table.
KEY |
VALUE |
/config/productData/@locationType |
[file] |
/config/productData/defaultFileName |
[c:/productMarketing/products.xml] |
/config/customerData/@locationType |
[db] |
/config/customerData/databaseURL |
[jdbc:db2:sales.abc.com/customer] |
/config/customerData/userId |
[maruyama] |
/config/customerData/passWord |
[montelac] |
/config/reportFormat/@locationType |
[file] |
/config/reportFormat/defaultFileName |
[c:/CEO/monthlyReport.xml] |
/config/reportFormat/reportTo |
["M. Murata", "K. Kosaka"] |
Now let us consider how we can map an XML file into such a hash table. We use a common technique of generating path expressions using SAX. Look at the SAX handler in Listing 8.13.4
Listing 8.13 Configclass, chap08/hashtable/Config.java
package chap08.hashtable; import java.io.IOException; import org.xml.sax.SAXException; import org.xml.sax.Attributes; import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.ParserConfigurationException; import java.util.Hashtable; import java.util.Vector; import java.util.Enumeration; public class Config extends DefaultHandler { private StringBuffer path; private StringBuffer textContent; [17] private Hashtable hashtable; public Config(String fn) throws SAXException, IOException, ParserConfigurationException { SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser parser = factory.newSAXParser(); this.path = new StringBuffer(); [24] this.hashtable = new Hashtable(); parser.parse(fn, this); } public void startElement(String uri, String local, String qname, Attributes atts) throws SAXException { // Update path path.append('/'); path.append(qname); int nattrs = atts.getLength(); for (int i=0; i<nattrs; i++) { [34] addValue(path.toString()+"/@"+atts.getQName(i), atts. getValue(i)); } [36] this.textContent = new StringBuffer(); } public void endElement(String uri, String local, String qname) throws SAXException { if (this.textContent != null) { [41] addValue(path.toString(), this.textContent.toString()); this.textContent = null; } // Restore path int pathlen = path.length(); [46] path.delete(pathlen-qname.length()-1,pathlen); } public void characters(char[] ch, int start, int length) throws SAXException { if (this.textContent != null) { this.textContent.append(ch, start, length); } } Hashtable getHashtable() { return this.hashtable; } void addValue(String key, String value) { Vector v = (Vector)this.hashtable.get(key); if (v == null) { v = new Vector(); this.hashtable.put(key,v); } v.add(value); } public static void main(String[] args) throws Exception { if (args.length < 1) { System.err.println("Usage: java chap08.hashtable.Config file"); System.exit(1); } Config theConfig = new Config(args[0]); Hashtable ht = theConfig.getHashtable(); for (Enumeration e = ht.keys(); e.hasMoreElements(); ){ String key = (String)e.nextElement(); System.out.println(key+"="+ht.get(key)); } } }
The hash table that we are going to build is declared in line 17 and initialized in line 24. A new entry is added to the hash table when an attribute (line 34) or an element with some text content (line 41) is found. The text content of an element is accumulated in a StringBufferin a variable named textContent. This buffer is initialized when a start tag is found (line 36) and discarded when an end tag is found. Therefore, any characters in SAX events that occur before a start element or after an end element are ignored. This is OK because our XML documents have no MIXED content models.
Another interesting point of this program is the way to build path expressions. During the parsing process, the current path expression is kept in a StringBuffer in a variable named path. We do not need to keep track of this path expression in a stack because when we see an end tag, we can always recover the parent path expression by removing the element name plus one character (for the separator "/"), as we do in line 46.
Once an XML configuration file is mapped into a hash table, configuration parameters can be efficiently accessed from any part of our program. A great advantage of this approach is that because our mapping code does not hardcode any particular element names or attribute names, there is no need to modify the code when new configuration parameters are added. In fact, this mapping code works universally regardless of the schema of input XML documents.
This mapping is optimized for keyed access by path expressions. Mapping to hash tables does not make sense for applications that have different access patterns, such as traversing the entire document.