Parser Configuration in JAXP
Once upon a time, parsers were a specialized area in their own right. Occupying pride of place in compiler design, parsers played a key role in the processing of source code. This is still true today, but parsers are now a timeless utility item. Parsers are deployed on the full spectrum of computing devices, from the smallest mobile devices all the way up to massive clustered web servers. We all use parsers in many different applications, including browsers, compilers, and increasingly in bridging data technology gaps.
Java also provides powerful facilities for influencing the presentation of parsed data. You can easily leverage powerful parsing models with just a small amount of Java. The value added by these parsing models extends beyond parsing into the presentation of legacy data as XML or HTML. In this article, the focus is XML-to-HTML transformation; this capability helps bridge the gap between legacy software and the latest standards produced by the XML-DEV group and by the W3C. Bridging this gap can extend the lifespan of legacy code and help in reducing the associated cost of ownership.
One of my earlier articles, "Saving Money with Legacy Data," explained how to transform legacy data into XML by using XSLT. I'll use the XML data from that article as the input for this one. We'll also explore Java API for XML Processing (JAXP) parser configuration, in particular taking XML data and transforming it into an HTML-readable format. To do this, I'll make further use of Extensible Stylesheet Language (XSL).
Events or Trees?
JAXP uses the parser standards Simple API for XML Parsing (SAX) and Document Object Model (DOM) so that you can process your data as a series of events or build a tree representation of it. Each standard has advantages:
- SAX uses less memory (because each data element is processed separately), whereas DOM reads all the data into memory.
- SAX may be more suitable for server-side processing, while DOM may suit interactive client-side applications.
JAXP also supports the Extensible Stylesheet Language Transformations (XSLT) standard, which allows for control over the presentation of data. This capability facilitates conversion of data between XML and other formats, such as ASCII, HTML, and so on.
JAXP is also very flexible; it allows the use of any XML-compliant parser from within an application. This is done with what's called a pluggability layer, which lets you plug in a nonstandard implementation of the SAX or DOM API. The pluggability layer also allows you to plug in an XSL processor, giving you control over how your XML data is displayed.