XML and Java: Bridging Application Data Structure and XML
8.1 Introduction
The theme of this book is how XML and Java interact with each other. In Chapter 2 (parsing) we explained how to transform an XML document into a Java data structure based on DOM and SAX. Chapter 3 (generation) showed how to generate an XML document from a Java program. Chapter 4 (DOM/DOM2) and Chapter 5 (SAX/SAX2) dealt with standard APIs to access an XML document from a Java program. Common to these techniques is the concept of mapping between XML documents and Java data structures. However, these are not the only ways to do mapping between XML and Java. This chapter introduces various mapping patterns and techniques.
As we discussed in Chapter 1, and as we will see in Chapters 12 (messaging) and 13 (Web services), XML is a data format suitable for data exchange and is not necessarily suitable for processing. From an application programmer's point of view, XML documents exist in an external data format only, and once they are read into memory, the programmer deals with the internal data structurethat is, Java objects for implementing application-specific logic. XML processors are responsible for converting XML documents into Java data in the form of DOM or SAX, but these data structures rarely represent your application's data structure. For example, suppose that you parse a purchase order document and receive a DOM structure. You need the customer's name and the serial number to process the data. From a <customer>element, you may need to scan its child nodes to find a <name>node and a <serialNumber>node and then convert them into appropriate Java data types. Instead of a DOM tree, what the application programmer really wants is Java objects reflecting the application data structure, such as class Customer. This class has the nameand serialNumberfields, and these fields are to be filled with the data extracted from the XML document. This eliminates the extra code of scanning a DOM tree and simplifies the application code. Therefore, it is common that application programmers convert a DOM tree or a SAX event stream into an application-specific data representation before any application-specific process is executed.
In the programming language literature, the concept of mapping between internal data structures and external octet sequences is common, and the terms marshal and unmarshal are used for describing the mapping processes (see Figure 8.1). An XML document is an octet stream. Therefore, parsing an XML document can be considered to be unmarshaling, while generating an XML document can be considered as marshaling.
Figure 8.1 Marshaling and unmarshaling
In this chapter, we explain that there are certain patterns in mappings between XML documents and application data. In Section 8.2, we consider mappings where the application data structure and the XML document structure are isomorphic. If the application data structure is slightly different from the input XML document structure, the use of XSLT to adjust the structure is a standard technique. We explain this technique in Section 8.3. Two-dimensional arrays, or tables, are also a common data structure. In Section 8.4, we briefly discuss tables as the application data structure. The general technique of mapping between XML documents and relational tables is covered in detail in Chapter 11, XML and Databases. However, we explain mapping for one special type of tablehash tables, in this chapter. Section 8.5 shows a useful technique of representing an XML document as a hash table. In more complex cases, the application data structure may be represented as a graph. We give an example of mapping an XML document into a graph structure in Section 8.6. In Chapter 15, Data Binding, we revisit mappings and explore how to automate mappings between the application data structure and the XML document structure.