Data Mapper
A layer of Mappers (473) that moves data between objects and a database while keeping them independent of each other and the mapper itself.
Objects and relational databases have different mechanisms for structuring data. Many parts of an object, such as collections and inheritance, aren't present in relational databases. When you build an object model with a lot of business logic it's valuable to use these mechanisms to better organize the data and the behavior that goes with it. Doing so leads to variant schemas; that is, the object schema and the relational schema don't match up.
You still need to transfer data between the two schemas, and this data transfer becomes a complexity in its own right. If the in-memory objects know about the relational database structure, changes in one tend to ripple to the other.
The Data Mapper is a layer of software that separates the in-memory objects from the database. Its responsibility is to transfer data between the two and also to isolate them from each other. With Data Mapper the in-memory objects needn't know even that there's a database present; they need no SQL interface code, and certainly no knowledge of the database schema. (The database schema is always ignorant of the objects that use it.) Since it's a form of Mapper (473), Data Mapper itself is even unknown to the domain layer.
How It Works
The separation between domain and data source is the main function of a Data Mapper, but there are plenty of details that have to be addressed to make this happen. There's also a lot of variety in how mapping layers are built. Many of the comments here are pretty broad, because I try to give a general overview of what you need to separate the cat from its skin.
We'll start with a very basic Data Mapper example. This is the simplest style of this layer that you can have and might not seem worth doing. With simple database mapping examples other patterns usually are simpler and thus better. If you are going to use Data Mapper at all you usually need more complicated cases. However, it's easier to explain the ideas if we start simple at a very basic level.
A simple case would have a Person and Person Mapper class. To load a person from the database, a client would call a find method on the mapper (Figure 10.3) The mapper uses an Identity Map (195) to see if the person is already loaded; if not, it loads it.
Updates are shown in Figure 10.4. A client asks the mapper to save a domain object. The mapper pulls the data out of the domain object and shuttles it to the database.
The whole layer of Data Mapper can be substituted, either for testing purposes or to allow a single domain layer to work with different databases.
A simple Data Mapper would just map a database table to an equivalent in-memory class on a field-to-field basis. Of course, things aren't usually simple. Mappers need a variety of strategies to handle classes that turn into multiple fields, classes that have multiple tables, classes with inheritance, and the joys of connecting together objects once they've been sorted out. The various object-relational mapping patterns in this book are all about that. It's usually easier to deploy these patterns with a Data Mapper than it is with the other organizing alternatives.
When it comes to inserts and updates, the database mapping layer needs to understand what objects have changed, which new ones have been created, and which ones have been destroyed. It also has to fit the whole workload into a transactional framework. The Unit of Work (184) pattern is a good way to organize this.
Figure 10.3 suggests that a single request to a find method results in a single SQL query. This isn't always true. Loading a typical order with multiple order lines may involve loading the order lines as well. The request from the client will usually lead to a graph of objects being loaded, with the mapper designer deciding exactly how much to pull back in one go. The point of this is to minimize database queries, so the finders typically need to know a fair bit about how clients use the objects in order to make the best choices for pulling data back.
This example leads to cases where you load multiple classes of domain objects from a single query. If you want to load orders and order lines, it will usually be faster to do a single query that joins the orders and order line tables. You then use the result set to load both the order and the order line instances (page 243).
Since objects are very interconnected, you usually have to stop pulling the data back at some point. Otherwise, you're likely to pull back the entire database with a request. Again, mapping layers have techniques to deal with this while minimizing the impact on the in-memory objects, using Lazy Load (200). Hence, the in-memory objects can't be entirely ignorant of the mapping layer. They may need to know about the finders and a few other mechanisms.
An application can have one Data Mapper or several. If you're hardcoding your mappers, it's best to use one for each domain class or root of a domain hierarchy. If you're using Metadata Mapping (306), you can get away with a single mapper class. In the latter case the limiting problem is your find methods. With a large application it can be too much to have a single mapper with lots of find methods, so it makes sense to split these methods up by each domain class or head of the domain hierarchy. You get a lot of small finder classes, but it's easy for a developer to locate the finder she needs.
As with any database find behavior, the finders need to use an Identity Map (195) in order to maintain the identity of the objects read from the database. Either you can have a Registry (480) of Identity Maps (195), or you can have each finder hold an Identity Map (195) (providing there is only one finder per class per session).
Handling Finders
In order to work with an object, you have to load it from the database. Usually the presentation layer will initiate things by loading some initial objects. Then control moves into the domain layer, at which point the code will mainly move from object to object using associations between them. This will work effectively providing that the domain layer has all the objects it needs loaded into memory or that you use Lazy Load (200) to load in additional objects when needed.
On occasion you may need the domain objects to invoke find methods on the Data Mapper. However, I've found that with a good Lazy Load (200) you can completely avoid this. For simpler applications, though, may not be worth trying to manage everything with associations and Lazy Load (200). Still, you don't want to add a dependency from your domain objects to your Data Mapper.
You can solve this dilemma by using Separated Interface (476). Put any find methods needed by the domain code into an interface class that you can place in the domain package.
Mapping Data to Domain Fields
Mappers need access to the fields in the domain objects. Often this can be a problem because you need public methods to support the mappers you don't want for domain logic. (I'm assuming that you won't commit the cardinal sin of making fields public.) There's no easy to answer to this. You could use a lower level of visibility by packaging the mappers closer to the domain objects, such as in the same package in Java, but this confuses the bigger dependency picture because you don't want other parts of the system that know the domain objects to know about the mappers. You can use reflection, which can often bypass the visibility rules of the language. It's slower, but the slower speed may end up as just a rounding error compared to the time taken by the SQL call. Or you can use public methods, but guard them with a status field so that they throw an exception if they're used outside the context of a database load. If so, name them in such a way that they're not mistaken for regular getters and setters.
Tied to this is the issue of when you create the object. In essence you have two options. One is to create the object with a rich constructor so that it's at least created with all its mandatory data. The other is to create an empty object and then populate it with the mandatory data. I usually prefer the former since it's nice to have a well-formed object from the start. This also means that, if you have an immutable field, you can enforce it by not providing any method to change its value.
The problem with a rich constructor is that you have to be aware of cyclic references. If you have two objects that reference each other, each time you try to load one it will try to load the other, which will in turn try to load the first one, and so on, until you run out of stack space. Avoiding this requires special case code, often using Lazy Load (200). Writing this special case code is messy, so it's worth trying to do without it. You can do this by creating an empty object. Use a no-arg constructor to create a blank object and insert that empty object immediately into the Identity Map (195). That way, if you have a cycle, the Identity Map (195) will return an object to stop the recursive loading.
Using an empty object like this means you may need some setters for values that are truly immutable when the object is loaded. A combination of a naming convention and perhaps some status-checking guards can fix this. You can also use reflection for data loading.
Metadata-Based Mappings
One of the decisions you need to make concerns storing the information about how fields in domain objects are mapped to columns in the database. The simplest, and often best, way to do this is with explicit code, which requires a mapper class for each domain object. The mapper does the mapping through assignments and has fields (usually constant strings) to store the SQL for database access. An alternative is to use Metadata Mapping (306), which stores the metadata as data, either in a class or in a separate file. The great advantage of metadata is that all the variation in the mappers can be handled through data without the need for more source code, either by use of code generation or reflective programming.
When to Use It
The primary occasion for using Data Mapper is when you want the database schema and the object model to evolve independently. The most common case for this is with a Domain Model (116). Data Mapper's primary benefit is that when working on the domain model you can ignore the database, both in design and in the build and testing process. The domain objects have no idea what the database structure is, because all the correspondence is done by the mappers.
This helps you in the code because you can understand and work with the domain objects without having to understand how they're stored in the database. You can modify the Domain Model (116) or the database without having to alter either. With complicated mappings, particularly those involving existing databases, this is very valuable.
The price, of course, is the extra layer that you don't get with Active Record (160), so the test for using these patterns is the complexity of the business logic. If you have fairly simple business logic, you probably won't need a Domain Model (116) or a Data Mapper. More complicated logic leads you to Domain Model (116) and therefore to Data Mapper.
I wouldn't choose Data Mapper without Domain Model (116), but can I use Domain Model (116) without Data Mapper? If the domain model is pretty simple, and the database is under the domain model developers' control, then it's reasonable for the domain objects to access the database directly with Active Record (160). Effectively this puts the mapper behavior discussed here into the domain objects themselves. As things become more complicated, it's better to refactor the database behavior out into a separate layer.
Remember that you don't have to build a full-featured database-mapping layer. It's a complicated beast to build, and there are products available that do this for you. For most cases I recommend buying a database-mapping layer rather than building one yourself.
Example: A Simple Database Mapper (Java)
Here's an absurdly simple use of Data Mapper to give you a feel for the basic structure. Our example is a person with an isomorphic people table.
class Person... private String lastName; private String firstName; private int numberOfDependents;
The database schema looks like this:
create table people (ID int primary key, lastname varchar, firstname varchar, number_of_dependents int)
We'll use the simple case here, where the Person Mapper class also implements the finder and Identity Map (195). However, I've added an abstract mapper Layer Supertype (475) to indicate where I can pull out some common behavior. Loading involves checking that the object isn't already in the Identity Map (195) and then pulling the data from the database.
The find behavior starts in the Person Mapper, which wraps calls to an abstract find method to find by ID.
class PersonMapper... protected String findStatement() { return "SELECT " + COLUMNS + " FROM people" + " WHERE id = ?"; } public static final String COLUMNS = " id, lastname, firstname, number_of_dependents "; public Person find(Long id) { return (Person) abstractFind(id); } public Person find(long id) { return find(new Long(id)); } class AbstractMapper... protected Map loadedMap = new HashMap(); abstract protected String findStatement(); protected DomainObject abstractFind(Long id) { DomainObject result = (DomainObject) loadedMap.get(id); if (result != null) return result; PreparedStatement findStatement = null; try { findStatement = DB.prepare(findStatement()); findStatement.setLong(1, id.longValue()); ResultSet rs = findStatement.executeQuery(); rs.next(); result = load(rs); return result; } catch (SQLException e) { throw new ApplicationException(e); } finally { DB.cleanUp(findStatement); } }
The find method calls the load method, which is split between the abstract and person mappers. The abstract mapper checks the ID, pulling it from the data and registering the new object in the Identity Map (195).
class AbstractMapper... protected DomainObject load(ResultSet rs) throws SQLException { Long id = new Long(rs.getLong(1)); if (loadedMap.containsKey(id)) return (DomainObject) loadedMap.get(id); DomainObject result = doLoad(id, rs); loadedMap.put(id, result); return result; } abstract protected DomainObject doLoad(Long id, ResultSet rs) throws SQLException; class PersonMapper... protected DomainObject doLoad(Long id, ResultSet rs) throws SQLException { String lastNameArg = rs.getString(2); String firstNameArg = rs.getString(3); int numDependentsArg = rs.getInt(4); return new Person(id, lastNameArg, firstNameArg, numDependentsArg); }
Notice that the Identity Map (195) is checked twice, once by abstractFind and once by load. There's a reason for this madness.
I need to check the map in the finder because, if the object is already there, I can save myself a trip to the database—I always want to save myself that long hike if I can. But I also need to check in the load because I may have queries that I can't be sure of resolving in the Identity Map (195). Say I want to find everyone whose last name matches some search pattern. I can't be sure that I have all such people already loaded, so I have to go to the database and run a query.
class PersonMapper... private static String findLastNameStatement = "SELECT " + COLUMNS + " FROM people " + " WHERE UPPER(lastname) like UPPER(?)" + " ORDER BY lastname"; public List findByLastName(String name) { PreparedStatement stmt = null; ResultSet rs = null; try { stmt = DB.prepare(findLastNameStatement); stmt.setString(1, name); rs = stmt.executeQuery(); return loadAll(rs); } catch (SQLException e) { throw new ApplicationException(e); } finally { DB.cleanUp(stmt, rs); } } class AbstractMapper... protected List loadAll(ResultSet rs) throws SQLException { List result = new ArrayList(); while (rs.next()) result.add(load(rs)); return result; }
When I do this I may pull back some rows in the result set that correspond to people I've already loaded. I have to ensure that I don't make a duplicate, so I have to check the Identity Map (195) again.
Writing a find method this way in each subclass that needs it involves some basic, but repetitive, coding, which I can eliminate by providing a general method.
class AbstractMapper... public List findMany(StatementSource source) { PreparedStatement stmt = null; ResultSet rs = null; try { stmt = DB.prepare(source.sql()); for (int i = 0; i < source.parameters().length; i++) stmt.setObject(i+1, source.parameters()[i]); rs = stmt.executeQuery(); return loadAll(rs); } catch (SQLException e) { throw new ApplicationException(e); } finally { DB.cleanUp(stmt, rs); } }
For this to work I need an interface that wraps both the SQL string and the loading of parameters into the prepared statement.
interface StatementSource... String sql(); Object[] parameters();
I can then use this facility by providing a suitable implementation as an inner class.
class PersonMapper... public List findByLastName2(String pattern) { return findMany(new FindByLastName(pattern)); } static class FindByLastName implements StatementSource { private String lastName; public FindByLastName(String lastName) { this.lastName = lastName; } public String sql() { return "SELECT " + COLUMNS + " FROM people " + " WHERE UPPER(lastname) like UPPER(?)" + " ORDER BY lastname"; } public Object[] parameters() { Object[] result = {lastName}; return result; } }
This kind of work can be done in other places where there's repetitive statement invocation code. On the whole I've made the examples here more straight to make them easier to follow. If you find yourself writing a lot of repetitive straight-ahead code you should consider doing something similar.
With the update the JDBC code is specific to the subtype.
class PersonMapper... private static final String updateStatementString = "UPDATE people " + " SET lastname = ?, firstname = ?, number_of_dependents = ? " + " WHERE id = ?"; public void update(Person subject) { PreparedStatement updateStatement = null; try { updateStatement = DB.prepare(updateStatementString); updateStatement.setString(1, subject.getLastName()); updateStatement.setString(2, subject.getFirstName()); updateStatement.setInt(3, subject.getNumberOfDependents()); updateStatement.setInt(4, subject.getID().intValue()); updateStatement.execute(); } catch (Exception e) { throw new ApplicationException(e); } finally { DB.cleanUp(updateStatement); } }
For the insert some code can be factored into the Layer Supertype (475)
class AbstractMapper... public Long insert(DomainObject subject) { PreparedStatement insertStatement = null; try { insertStatement = DB.prepare(insertStatement()); subject.setID(findNextDatabaseId()); insertStatement.setInt(1, subject.getID().intValue()); doInsert(subject, insertStatement); insertStatement.execute(); loadedMap.put(subject.getID(), subject); return subject.getID(); } catch (SQLException e) { throw new ApplicationException(e); } finally { DB.cleanUp(insertStatement); } } abstract protected String insertStatement(); abstract protected void doInsert(DomainObject subject, PreparedStatement insertStatement) throws SQLException; class PersonMapper... protected String insertStatement() { return "INSERT INTO people VALUES (?, ?, ?, ?)"; } protected void doInsert( DomainObject abstractSubject, PreparedStatement stmt) throws SQLException { Person subject = (Person) abstractSubject; stmt.setString(2, subject.getLastName()); stmt.setString(3, subject.getFirstName()); stmt.setInt(4, subject.getNumberOfDependents()); }
Example: Separating the Finders (Java)
To allow domain objects to invoke finder behavior I can use Separated Interface (476) to separate the finder interfaces from the mappers (Figure 10.5). I can put these finder interfaces in a separate package that's visible to the domain layer, or, as in this case, I can put them in the domain layer itself.
One of the most common finds is one that finds an object according to a particular surrogate ID. Much of this processing is quite generic, so it can be handled by a suitable Layer Supertype (475). All it needs is a Layer Supertype (475) for domain objects that know about IDs.
The interface for finding lies in the finder interface. It's usually best not made generic because you need to know what the return type is.
interface ArtistFinder... Artist find(Long id); Artist find(long id);
The finder interface is best declared in the domain package with the finders held in a Registry (480). In this case I've made the mapper class implement the finder interface.
class ArtistMapper implements ArtistFinder... public Artist find(Long id) { return (Artist) abstractFind(id); } public Artist find(long id) { return find(new Long(id)); }
The bulk of the find method is done by the mapper's Layer Supertype (475), which checks the Identity Map (195) to see if the object is already in memory. If not, it completes a prepared statement that's loaded in by the artist mapper and executes it.
class AbstractMapper... abstract protected String findStatement(); protected Map loadedMap = new HashMap(); protected DomainObject abstractFind(Long id) { DomainObject result = (DomainObject) loadedMap.get(id); if (result != null) return result; PreparedStatement stmt = null; ResultSet rs = null; try { stmt = DB.prepare(findStatement()); stmt.setLong(1, id.longValue()); rs = stmt.executeQuery(); rs.next(); result = load(rs); return result; } catch (SQLException e) { throw new ApplicationException(e); } finally {cleanUp(stmt, rs); } } class ArtistMapper... protected String findStatement() { return "select " + COLUMN_LIST + " from artists art where ID = ?"; } public static String COLUMN_LIST = "art.ID, art.name";
The find part of the behavior is about getting either the existing object or a new one. The load part is about putting the data from the database into a new object.
class AbstractMapper... protected DomainObject load(ResultSet rs) throws SQLException { Long id = new Long(rs.getLong("id")); if (loadedMap.containsKey(id)) return (DomainObject) loadedMap.get(id); DomainObject result = doLoad(id, rs); loadedMap.put(id, result); return result; } abstract protected DomainObject doLoad(Long id, ResultSet rs) throws SQLException; class ArtistMapper... protected DomainObject doLoad(Long id, ResultSet rs) throws SQLException { String name = rs.getString("name"); Artist result = new Artist(id, name); return result; }
Notice that the load method also checks the Identity Map (195). Although redundant in this case, the load can be called by other finders that haven't already done this check. In this scheme all a subclass has to do is develop a doLoad method to load the actual data needed, and return a suitable prepared statement from the findStatement method.
You can also do a find based on a query. Say we have a database of tracks and albums and we want a finder that will find all the tracks on a specified album. Again the interface declares the finders.
interface TrackFinder... Track find(Long id); Track find(long id); List findForAlbum(Long albumID);
Since this is a specific find method for this class, it's implemented in a specific class, such as the track mapper class, rather than in a Layer Supertype (475). As with any finder, there are two methods to the implementation. One sets up the prepared statement; the other wraps the call to the prepared statement and interprets the results.
class TrackMapper... public static final String findForAlbumStatement = "SELECT ID, seq, albumID, title " + "FROM tracks " + "WHERE albumID = ? ORDER BY seq"; public List findForAlbum(Long albumID) { PreparedStatement stmt = null; ResultSet rs = null; try { stmt = DB.prepare(findForAlbumStatement); stmt.setLong(1, albumID.longValue()); rs = stmt.executeQuery(); List result = new ArrayList(); while (rs.next()) result.add(load(rs)); return result; } catch (SQLException e) { throw new ApplicationException(e); } finally {cleanUp(stmt, rs); } }
The finder calls a load method for each row in the result set. This method has the responsibility of creating the in-memory object and loading it with the data. As in the previous example, some of this can be handled in a Layer Supertype (475), including checking the Identity Map (195) to see if something is already loaded.
Example: Creating an Empty Object (Java)
There are two basic approaches for loading an object. One is to create a fully valid object with a constructor, which is what I've done in the examples above. This results in the following loading code:
class AbstractMapper... protected DomainObject load(ResultSet rs) throws SQLException { Long id = new Long(rs.getLong(1)); if (loadedMap.containsKey(id)) return (DomainObject) loadedMap.get(id); DomainObject result = doLoad(id, rs); loadedMap.put(id, result); return result; } abstract protected DomainObject doLoad(Long id, ResultSet rs) throws SQLException; class PersonMapper... protected DomainObject doLoad(Long id, ResultSet rs) throws SQLException { String lastNameArg = rs.getString(2); String firstNameArg = rs.getString(3); int numDependentsArg = rs.getInt(4); return new Person(id, lastNameArg, firstNameArg, numDependentsArg); }
The alternative is to create an empty object and load it with the setters later.
class AbstractMapper... protected DomainObjectEL load(ResultSet rs) throws SQLException { Long id = new Long(rs.getLong(1)); if (loadedMap.containsKey(id)) return (DomainObjectEL) loadedMap.get(id); DomainObjectEL result = createDomainObject(); result.setID(id); loadedMap.put(id, result); doLoad (result, rs); return result; } abstract protected DomainObjectEL createDomainObject(); abstract protected void doLoad(DomainObjectEL obj, ResultSet rs) throws SQLException; class PersonMapper... protected DomainObjectEL createDomainObject() { return new Person(); } protected void doLoad(DomainObjectEL obj, ResultSet rs) throws SQLException { Person person = (Person) obj; person.dbLoadLastName(rs.getString(2)); person.setFirstName(rs.getString(3)); person.setNumberOfDependents(rs.getInt(4)); }
Notice that I'm using a different kind of domain object Layer Supertype (475) here, because I want to control the use of the setters. Let's say that I want the last name of a person to be an immutable field. In this case I don't want to change the value of the field once it's loaded, so I add a status field to the domain object.
class DomainObjectEL... private int state = LOADING; private static final int LOADING = 0; private static final int ACTIVE = 1; public void beActive() { state = ACTIVE; }
I can then check the value of this during a load.
class Person... public void dbLoadLastName(String lastName) { assertStateIsLoading(); this.lastName = lastName; } class DomainObjectEL... void assertStateIsLoading() { Assert.isTrue(state == LOADING); }
What I don't like about this is that we now have a method in the interface that most clients of the Person class can't use. This is an argument for the mapper using reflection to set the field, which will completely bypass Java's protection mechanisms.
Is the status-based guard worth the trouble? I'm not entirely sure. On the one hand it will catch bugs caused by people calling update methods at the wrong time. On the other hand is the seriousness of the bugs worth the cost of the mechanism? At the moment I don't have a strong opinion either way.