Structural Mapping Patterns
When people talk about object-relational mapping, mostly what they mean is these kinds of structural mapping patterns, which you use when mapping between in-memory objects and database tables. These patterns aren't usually relevant for Table Data Gateway (144), but you may use a few of them if you use Row Data Gateway (152) or Active Record (160). You'll probably need to use all of them for Data Mapper (165).
Mapping Relationships
The central issue here is the different way in which objects and relations handle links, which leads to two problems. First there's a difference in representation. Objects handle links by storing references that are held by the runtime of either memory-managed environments or memory addresses. Relational databases handle links by forming a key into another table. Second, objects can easily use collections to handle multiple references from a single field, while normalization forces all relation links to be single valued. This leads to reversals of the data structure between objects and tables. An order object naturally has a collection of line item objects that don't need any reference back to the order. However, the table structure is the other way aroundthe line item must include a foreign key reference to the order since the order can't have a multivalued field.
The way to handle the representation problem is to keep the relational identity of each object as an Identity Field (216) in the object, and to look up these values to map back and forth between the object references and the relational keys. It's a tedious process but not that difficult once you understand the basic technique. When you read objects from the disk you use an Identity Map (195) as a lookup table from relational keys to objects. Each time you come across a foreign key in the table, you use Foreign Key Mapping (236) (see Figure 3.5) to wire up the appropriate inter-object reference. If you don't have the key in the Identity Map (195), you need to either go to the database to get it or use a Lazy Load (200). Each time you save an object, you save it into the row with the right key. Any inter-object reference is replaced with the target object's ID field.
Figure 3.5 Use a Foreign Key Mapping (236) to map a single-valued field.
On this foundation the collection handling requires a more complex version of Foreign Key Mapping (236) (see Figure 3.6). If an object has a collection, you need to issue another query to find all the rows that link to the ID of the source object (or you can now avoid the query with Lazy Load (200)). Each object that comes back gets created and added to the collection. Saving the collection involves saving each object in it and making sure it has a foreign key to the source object. This gets messy, especially when you have to detect objects added or removed from the collection. This can get repetitive when you get the hang of it, which is why some form of metadata-based approach becomes an obvious move for larger systems (I'll elaborate on that later). If the collection objects aren't used outside the scope of the collection's owner, you can use Dependent Mapping (262) to simplify the mapping.
Figure 3.6 Use a Foreign Key Mapping (236) to map a collection field.
objects aren't used outside the scope of the collection's owner, you can use Dependent Mapping (262) to simplify the mapping.
A different case comes up with a many-to-many relationship, which has a collection on both ends. An example is a person having many skills and each skill knowing the people who use it. Relational databases can't handle this directly, so you use an Association Table Mapping (248) (see Figure 3.7) to create a new relational table just to handle the many-to-many association.
Figure 3.7 Use an Association Table Mapping (248) to map a many-to-many association.
When you're working with collections, a common gotcha is to rely on the ordering within the collection. In OO languages it's common to use ordered collections such as lists and arraysindeed, it often makes testing easier. Nevertheless, it's very difficult to maintain an arbitrarily ordered collection when saved to a relational database. For this reason it's worth considering using unordered sets for storing collections. Another option is to decide on a sort order whenever you do a collection query, although that can be quite expensive.
In some cases referential integrity can make updates more complex. Modern systems allow you to defer referential integrity checking to the end of the transaction. If you have this capability, there's no reason not to use it. Otherwise, the database will check on every write. In this case you have to be careful to do your updates in the right order. How to do this is out of the scope of this book, but one technique is to do a topological sort of your updates. Another is to hardcode which tables get written in which order. This can sometimes reduce deadlock problems inside the database that cause transactions to roll back too often.
Identity Field (216) is used for inter-object references that turn into foreign keys, but not all object relationships need to be persisted that way. Small Value Objects (486), such as date ranges and money objects clearly shouldn't be represented as their own table in the database. Instead, take all the fields of the Value Object (486) and embed them into the linked object as a Embedded Value (268). Since Value Objects (486) have value semantics, you can happily create them each time you get a read and you don't need to bother with an Identity Map (195). Writing them out is also easyjust dereference the object and spit out its fields into the owning table.
You can do this kind of thing on a larger scale by taking a whole cluster of objects and saving them as a single column in a table as a Serialized LOB (272). LOB stands for "Large OBject," which can be either binary (BLOB) textual (CLOBCharacter Large OBject). Serializing a clump of objects as an XML document is an obvious route to take for a hierarchic object structure. This way you can grab a whole bunch of small linked objects in a single read. Often databases perform poorly with small highly interconnected objectswhere you spend a lot of time making many small database calls. Hierarchic structures such as org charts and bills of materials are where a Serialized LOB (272) can save a lot of database roundtrips.
The downside is that SQL isn't aware of what's happening, so you can't make portable queries against the data structure. Again, XML may come to the rescue here, allowing you to embed XPath query expressions within SQL calls, although the embedding is largely nonstandard at the moment. As a result Serialized LOB (272) is best used when you don't want to query for the parts of the stored structure.
Usually a Serialized LOB (272) is best for a relatively isolated group of objects that make part of an application. If you use it too much, it ends up turning your database into little more than a transactional file system.
Inheritance
In the above hierarchies I'm talking about compositional hierarchies, such as a parts tree, which relational system traditionally do poorly. There's another kind of hierarchy that causes relational headaches: a class hierarchy linked by inheritance. Since there's no standard way to do inheritance in SQL, we again have a mapping to perform. For any inheritance structure there are basically three options. You can have a one table for all the classes in the hierarchy: Single Table Inheritance (278) (see Figure 3.8); one table for each concrete class: Concrete Table Inheritance (293) (see Figure 3.9); or one table per class in the hierarchy; Class Table Inheritance (285) (see Figure 3.10).
Figure 3.8 Single Table Inheritance (278) uses one table to store all the classes in a hierarchy.
Figure 3.9 Concrete Table Inheritance (293) uses one table to store each concrete class in a ierarchy.
Figure 3.10 \Class Table Inheritance (285) uses one table for each class in a hierarchy.
The trade-offs are all between duplication of data structure and speed of access. Class Table Inheritance (285) is the simplest relationship between the classes and the tables, but it needs multiple joins to load a single object, which usually reduces performance. Concrete Table Inheritance (293) avoids the joins, allowing you pull a single object from one table, but it's brittle to changes. With any change to a superclass you have to remember to alter all the tables (and the mapping code). Altering the hierarchy itself can cause even bigger changes. Also, the lack of a superclass table can make key management awkward and get in the way of referential integrity, although it does reduce lock contention on the superclass table. In some databases Single Table Inheritance (278)'s biggest downside is wasted space, since each row has to have columns for all possible subtypes and this leads to empty columns. However, many databases do a very good job of compressing wasted table space. Another problem with Single Table Inheritance (278) is its size, making it a bottleneck for accesses. Its great advantage is that it puts all the stuff in one place, which makes modification easier and avoids joins.
The three options aren't mutually exclusive, and in one hierarchy you can mix patterns. For instance, you could have several classes pulled together with Single Table Inheritance (278) and use Class Table Inheritance (285) for a few unusual cases. Of course, mixing patterns adds complexity.
There's no clearcut winner here. You need to take into account your own circumstances and preferences, much as with all the rest of these patterns. My first choice tends to be Single Table Inheritance (278), as it's easy to do and is resilient to many refactorings. I tend to use the other two as needed to help solve the inevitable issues with irrelevant and wasted columns. Often the best is to talk to the DBAs; they often have good advice as to the sort of access that makes the most sense for the database.
All the examples just described, and in the patterns, use single inheritance. Although multiple inheritance is becoming less fashionable these days and most languages are increasingly avoiding it, the issue still appears in O/R mapping when you use interfaces, as in Java and .NET. The patterns here don't go into this topic specifically, but essentially you cope with multiple inheritance using variations of the trio of inheritance patterns. Single Table Inheritance (278) puts all superclasses and interfaces into the one big table, Class Table Inheritance (285) makes a separate table for each interface and superclass, and Concrete Table Inheritance (293) includes all interfaces and superclasses in each concrete table.