- Databases Working Overtime
- Binary Data in an XML World
- Binary Data in a Relational Database
- Conclusion
Binary Data in an XML World
Typically, binary data is a data representation that’s machine-readable or application-readable. A text file such as an XML file isn’t binary. In many cases, XML files are both machine-generated and machine-consumed. In other words, many XML text files aren’t used directly by humans. This is the rather paradoxical aspect of XML technology—it’s text-based and non-binary, but is often used as though it’s binary. A Word document is another example of a binary file—the raw text is stored in a Word file along with all the fancy formatting data.
One reason why XML is employed as a binary and non-binary format is its widespread use. XML has become the lingua franca of the data world. A wide range of tools are available for manipulating XML, and the Java language is inextricably interwoven with XML. So, XML is here to stay!
Binary data is often quite compact in nature. By comparison, XML files are bulky, because XML files typically include data type information. Type information is used to help receiving entities understand and decode the contained data. Listing 1 is an example of an excerpt from an XML file that contains IT management event data.
Listing 1 An XML file excerpt.
<?xml version="1.0" encoding="utf-8"?> <LogDetails> <LogItem> <InstanceId>1073758988</InstanceId> <MachineName>MACHINE11</MachineName> <TimeGenerated>9/7/2007 10:32:00</TimeGenerated> <Category>Server</Category> <EntryType>Information</EntryType> <Message>This is an informational message; no user action required.</Message> </LogItem> </LogDetails>
Notice in Listing 1 that each line has XML starting and ending tags. The data is sandwiched between the pair of tags. For instance, look at the following tag:
<MachineName>MACHINE11</MachineName>
The start tag called <MachineName> is followed by the data portion MACHINE11 and then the end tag </MachineName>. For small data sets, the overhead of the tags is negligible; but when you’re sending XML files that contain millions of such records, that overhead starts to grow.
One other consideration in favor of XML files is our old friend discussed in my earlier articles: separation of concerns. By including data descriptions inside an XML file, you provide important information to client software. This allows for a looser relationship between the software that creates the data and the software that consumes the data. This in turn facilitates separation of concerns because the client then doesn’t have to request more information from the data supplier. However, it’s important to understand that this cooperation doesn’t come for free; the XML overhead can be high, particularly in high-performance environments such as bank transaction reconciliation. If the client already understands the data format, then the data can be streamed without the XML overhead.
One example of a compact binary file is a zip file. Another binary data format is audio files. Audio files typically are created using some form of recording program—a topic I covered recently in my Java Sound articles. The audio file is written to disk and can be played by piping it into a player program. The latter program determines the format and plays back the contained audio.
Now you have the picture on what constitutes binary data. How can we store such data in a relational database?