Using JDO and HSQLDB To Process Lots of Data in Limited Memory
- Quick Intro to JDO
- The Problem at Hand
- Persisting the Objects
- Retrieving the Objects
- The Wrap-Up
Having more data than memory is a problem that has faced developers since the dawn of computing. These days, the typical way to handle this problem is via a database running on a separate server. But, for a variety of reasons, that's not always an option. Another common solution is to use a file-based collection mechanisma B-tree or the like. But there's no standard implementation of a B-tree in Java, and developing one that works well is not a small task.
In this article, we'll look at another option, a sort of combination of the two I've just mentioned. We'll make use of the HSQL Database Engine (HSQLDB), an open source lightweight database that can be instantiated as needed and configured to store its data on the filesystem. Rather than working directly with the database via JDBC, however, we'll store and retrieve our data via Sun's Java Data Objects extension, with the idea of saving a lot of excess coding without losing too much in terms of performance.
Quick Intro to JDO
Java Data Objects (JDO) is Sun's take on object persistence for Java. Object persistence is basically the automated storage and retrieval of objects to and from a database without having to worry about any database internals in the code. Accessing a database is not a complicated task in Java via JDBC, but the code to store and retrieve objects can get a bit messy, especially when you're dealing with inter-object relationships. This situation often results in lots of tedious extra code to develop and maintain, which of course is bad news.
JDO hides all that away. Objects are mapped to tables in the database outside your code, and while you may need to tweak some of the mappings yourself, most JDO packages handle the dirty work. We'll use JDO Genie for this article, thanks to its support for HSQLDB and easy-to-use tools that ship with it. JDO Genie includes a GUI Workbench application for handling tasks such as selecting your database implementation, mapping classes to tables, and creating those tables (along with a number of other monitoring functions that go beyond the scope of this article).
Both HSQLDB and JDO Genie can be run in server mode or standalone mode, allowing you to run them locally or serve others remotely. We'll run everything standalone for this article.