- Introduction
- Let Me Get This Straight, Apache Derby Is IBM Cloudscape?
- Development of the Apache Derby Database—Who Can Contribute and How?
- How Can IBM Sell a Product for Profit and Contribute the Same Product to the Open Source Community?
- How an Open Source Database Like Apache Derby Can Help
- Why the Need for a Local Data Store?
- Why Use a Relational Database?
- How the Apache Derby Platform Can Help Your Business
- A High-Level View of the Apache Derby Database
- The Apache Derby Components
- Developing Apache Derby Applications
Why Use a Relational Database?
Now that it is apparent how applications and business processes can benefit from a functional place to persist and manage data, it is interesting to briefly comment on why a relational database is the best choice (as opposed to other approaches, like a file system, etc.).
There are many applications out there today that use some sort of local storage format to keep data collocated with the application, or to address any of the issues mentioned in the previous section. However, without a database, most of the proposed solutions cannot handle the atomicity of multiple changes (all or nothing, and the management of transactions that may impact each other), and there is no built-in recovery mechanisms in case of a failure during the transaction, no parallelism, no set-oriented data access API, no ability to share the same data at the same time between applications or users based on the business rules, and so on.
The most common format—perhaps due to legacy decisions and the lack of an embeddable, small-footprint database like Apache Derby in the past—is the flat file. Some databases even build on a flat file system, like FileMaker Pro. In the past, developers might have used a flat file system for their data storage requirements because it was perceived to be simple (for operational and management reasons, not from a coding standpoint) and fast. However, this architecture comes with its disadvantages: namely, it requires a lot of hand coding and is prone to errors.
Another common approach is XML; most programming languages implement standard programming interfaces for reading and writing XML (the XML specification itself is an open standard maintained by the World Wide Web Consortium). Like flat files, XML files can be easy to understand when you open them in an XML editor; however, as your data and the relationships between your data elements become more complex, designing a single XML file to store your data becomes a non-trivial exercise.
Storing data in XML files forces you to implement locking mechanisms to ensure that the files are always in a consistent state. In addition, as the amount of data you have to store in the file grows, your application will either require more system resources to store the entire XML file in memory, or have to reread and parse the XML file every time it has to retrieve data.
XML files are a great means of communicating between applications who don’t know each other. In fact, XML is great for a lot of reasons. But as a data store, it requires a lot of effort for anything more than trivial data persistence scenarios. Of course advanced databases like DB2 UDB have the capability to provide XML repositories in combination with all the benefits that accrue from years of experience in the relational world.
One of the most significant benefits of using a database such as Apache Derby is that it delivers atomicity, consistency, isolation, and durability (ACID) compliance. Not all open source or proprietary databases do that. Databases exist for the purpose of providing a reliable and permanent storage mechanism that encompasses very strict properties embodied by these ACID characteristics. Although it is outside the scope of this chapter to discuss relational theory, you can learn more about ACID transactional processing and why it is so important from the "bible" of database theory: An Introduction to Database Systems, by C. J. Date (Addison-Wesley, 2004).
Another benefit of a relational data store is the concept of relationships. In fact, that is what relational databases (as the name would imply) are all about: set relationships. Whereas mathematicians can use algebra and set theory to no end for theoretical proofs, what a relational database delivers is a mechanism by which entities can be easily traversed and assembled, in any direction, with minimal coding effort. Need to know all the accounts with a last name that contains a specific group of characters? What about folks in the Eastern region? What happens to your application logic and pointer files when you move this data to another storage device in a flat file system? All this is transparently retrievable and navigable using a relational database, without code changes.
How is this possible? What is this ‘magic’ language by which you can interact with your data in an abstracted way without caring about its location? Structured Query Language (SQL): it can be used (in one form or another) to access all sorts of relational databases. SQL is a declarative language in that it does not contain any variables, loops, or other programming constructs (though there are procedural cousins that incorporate these entities). It is easy to learn, even for the non-programming kind.
Using a data store that can process SQL makes the data model more dynamic and flexible than it could ever hope to be when using a flat file system or some other approach. For example, if new information is deemed to be important, a query can be quickly written and implemented into the application. The methods by which the data needs to be accessed, or where it resides, are not issues. Typically, with a flat file system, this information was only well known to the original application developer; it’s just not an issue with a relational database system. With a relational database system, access to the data and how to get to that data is abstracted from the developer.
With a flat file system, adding a new structure to the data model requires a new file (or the editing of an existing one), the registration of (or pointer updates to) the file, and so on. With an XML file system, you would have to modify your existing XML file structure and all of the methods that make assumptions based on the existing structure, or add another XML file to contain the new structure, and then implement a complex (and customized) join operation between the two XML files. Today, with Apache Derby relational database technology, it is as simple as a CREATE TABLE statement to add a new structure and a SELECT statement to return the new data.
The use of SQL also opens up a database to a wide variety of interfaces and access methods that would all have to be hand-coded when working with a flat file system. These interfaces mean a shortened and less expensive development cycle for applications, along with a readily available talent pool of developers and administrators.
Today’s computers are orders of magnitude more powerful than those of the previous two generations. Many decisions were made to use flat files as opposed to a relational database some years ago due to the cost/power ratio for computers. When an Intel x486 processor with 64 MB of RAM was the most powerful (and expensive) computer around, you can see why economics would dictate the use of a flat file system. Today, computers and personal devices have moved far from the early adopters’ phase of their respective product lifecycle curves such that they are capable of hosting a lot of data at relatively inexpensive costs.
Relational databases are very scalable as well. You can start with a database that resides on a tablet for a single user and move it to a large symmetric multiprocessor (SMP) machine that is used to support thousands of users and literally millions of transactions per minute.
Finally, the manageability of a relational database was thought to require an expert skill set, and thus it was deemed inappropriate for many applications. Databases like Apache Derby do not require users to perform table reorganizations or know anything about heap-related tuning parameters (if you do not know what these are, don’t worry—that is the point).