Five Steps to Managing Unstructured Data with Derby
- Databases Working Overtime
- Binary Data in an XML World
- Binary Data in a Relational Database
- Conclusion
Databases Working Overtime
As IT is increasingly commoditized, the elements are called on for greater levels of service. This is as true for the data center as a whole as it is for individual applications. Storage is perhaps the single most critical point of failure in any IT setup—lost applications can always be reinstalled, but if you lose your data, you’re in trouble. And the nature of data is changing; for example, it’s common to have to search through old email as part of chasing down a discussion on some key question. Audiovisual data (sound, picture, and video) also forms an important part of storage requirements. Thus, data is spread across a broad range of systems and technologies. Podcasts are another case in point, because the audio data typically is downloaded to a machine for later consumption.
From a database perspective, this type of data is called unstructured. You might be surprised to learn that a relational database such as Derby can store and retrieve such data. Unstructured data exists in big chunks, and its constituent elements are essentially of little use in isolation. Another type of unstructured data is a list of the applications installed on a given platform. The management of such unstructured data is the topic of this article.
As with my previous Derby articles, [1, 2] I want to try to answer a key question related to the use of this important open source database product. In this article, the problem is how to handle unstructured data by using Java. To make it interesting, I’ll demonstrate the use of Derby for the storage of recorded audio data. The database technology I’ll use in this case is called binary large objects (BLOBs, for short).
First, though, let’s take a quick detour to learn exactly what binary data is.