Seven Steps to XML Mastery, Step 4: Parsing and Processing XML (Part 1 of 2)
- Event Versus Tree Parsing
- ZwiftBooks and SAX
- SAX and State
- ZwiftBooks and DOM
- The JAXP Factory Model
- Summary
- References
Now it’s time to move to step 4 in our series and look at options for working with XML at a programming level. For a company like ZwiftBooks, building a corporate infrastructure around XML implies being able to move XML code into and out of programs seamlessly. This means extracting, modifying, and creating XML by using an XML parser. In this article, we’ll look at how ZwiftBooks can utilize XML parsing technology to integrate with an existing warehouse alert program.
Event Versus Tree Parsing
XML parsers fall into two categories:
- Streaming event parsers such as Simple API for XML (SAX) and Streaming API for XML (StAX) pass parsing events to handlers defined as part of an application.
- Tree-based parsers such as the Document Object Model (DOM) build an XML tree and provide methods to navigate the tree.
Figure 1 illustrates the two major families of parsers for programmatically working with XML. Both event-based parsers and tree-based parsers take an XML document as input, but the two types of parsers treat that XML very differently.
Figure 1 Event versus tree parsing for XML documents.
Event-Based APIs
An event-based API reports parsing events to your application through the use of callbacks. As the XML streams into the parser, your handler is called as the parser encounters events of interest—start of document, start of element, end of element, and end of document (to name a few). Writing a SAX or StAX application means writing handlers that react when an element or attribute of interest is encountered in the XML.
Tree-Based APIs
A main tree-based API such as the W3C’s DOM maps an XML document into an internal tree structure, providing programmatic interfaces for navigating that tree. Methods are available to determine child and parent elements of nodes as well as to extract the content of elements of attributes. With DOM, it’s also possible to modify the tree and thus create new XML.
Choosing a Parser
The choice of event versus tree parser depends on the application requirements:
- Event-based parsers are good for extracting an element or attribute from some XML and reacting to it in some way. Since event parsers look at only one small part of an XML document at a time, you can parse very large documents. Even documents in the terabyte range can be handled by a SAX or StAX parser.
- Tree-based APIs build a navigable internal representation of a document. This approach is useful for a wide range of applications, but has a heavy impact on system resources—especially with large documents or special data-modeling requirements. For example, building a DOM tree, mapping it onto a new data structure, and discarding the original is typically not worth the effort. However, if data context is important, DOM is the way to go.