- Event Versus Tree Parsing
- ZwiftBooks and SAX
- SAX and State
- ZwiftBooks and DOM
- The JAXP Factory Model
- Summary
- References
ZwiftBooks and DOM
The Document Object Model (DOM) is an application programming interface that provides access to an XML document represented in tree form. DOM traces its roots to pre-XML days, when various browsers developed their own incompatible tree-based interfaces for manipulating HTML with JavaScript. To help address this problem, the W3C came up with a series of platform-neutral and language-neutral specifications that defined the W3C Document Object Model (W3C DOM).
With DOM, a document is accessed in tree form, requiring that the entire document be parsed and stored in memory. This arrangement makes DOM most suitable for applications in which the document elements are accessed repeatedly in an unpredictable sequence. If the application involves a sequential or one-time selective read or write per processed document, DOM presents considerable overhead. However, DOM supports a rich interface with methods for both retrieving and modifying/adding elements and attributes. This capability to change the DOM tree allows an application to create XML—something that SAX cannot do.
DOM specifications are divided into levels, which have evolved over time. To claim support for a level, an application must implement all the requirements of the claimed level and the underlying levels. Table 1 describes the various DOM levels and what they include.
Table 1 Levels of DOM.
Level |
Description |
0 |
Technically, not a formal specification published by the W3C, but a shorthand that refers to what existed before the standardization process. |
1 |
Navigation of DOM (HTML and XML) document (tree structure) and manipulation of content; includes adding elements. |
2 |
XML namespace support, filtered views, and events. |
3 |
Consists of six different specifications, including DOM loading and saving, XPath, and validation. |
Listing 2 gives the DOM version of the SAX program from Listing 1. While this example doesn’t show the full capability of DOM, it does illustrate the structure of an XML application that uses DOM.
Listing 2 DOM version of factory ISBN alert program.
1. import javax.xml.parsers.*; 2. import org.xml.sax.SAXException; 3. import org.xml.sax.SAXParseException; 4. 5. import java.io.File; 6. import java.io.IOException; 7. 8. import org.w3c.dom.*; 9. 10. 11. public class DOMGetIsbn { 12. 13. static String orderFileName = "EuroBooks.xml"; 14. 15. public static void main(String[] args) { 16. 17. Document document = null; 18. 19. DocumentBuilderFactory factory = 20. DocumentBuilderFactory.newInstance(); 21. 22. try { 23. DocumentBuilder builder = factory.newDocumentBuilder(); 24. document = builder.parse( orderFileName ); 25. 26. } catch (SAXParseException spe) { 27. System.out.println("SAX parse Exception");} 28. 29. catch (Exception e) { 30. System.out.println("OTHER parse Exception"); 31. System.out.println(e); 32. } 33. 34. 35. // read XML from file to DOM 36. NodeList nodeList = document.getElementsByTagName ("book"); 37. 38. // determine number of book elements 39. int length = nodeList.getLength (); 40. 41. String isbn = null; 42. for (int i = 0; i < length; ++i) 43. 44. isbn = ((Element) nodeList.item (i)).getAttribute ("isbn"); 45. alertWarehouse( isbnNumber ); 46. } 47. 48. }
One of the most useful retrieval methods in DOM is getElementsByTagName(String elementName), which returns a list of the elements that match the elementName parameter. Lines 42–46 illustrate simple iteration over this list and the extraction of the isbn attribute using the getAttribute method.
It’s important to note here that the list of elements returned by DOM consists of pointers into the underlying DOM tree. From these node references, we can navigate to node children and parents. With DOM, we have access to the context of the nodes we asked for, unlike with SAX, which simply leaves us with a textual representation of the XML data.