Implementing Models with Relational Databases
Mapping Models to Tables
Article 2 of this series, "Modeling Database Applications," discussed the importance of models and their use for database applications. The Unified Modeling Language (UML) is a popular notation for modeling databases, and we continue discussing its use in this article.
There are systematic rules for converting a model to a database structure. It is helpful to understand the rules, even though you would normally apply them with a tool, in practice. We sketch out the basic rules here.
Identity
The first step in implementing a model is to deal with identity. We first define several terms:
A candidate key is a combination of columns that uniquely identifies each row in a table. The combination must be minimal and should not include any columns that are not needed for unique identification. No column in a candidate key can be null.
A primary key is a candidate key that is preferentially used to access the records in a table. A table can have at most one primary key; normally, each table should have a primary key.
A foreign key is a reference to a candidate key, and is the glue that binds tables. A foreign key must have a value for all columns in a candidate key, or it must be wholly null. Foreign keys are used to implement relationships and generalizations.
Each foreign key should refer only to a table's primary key (not to some other candidate key). There are no disadvantages and several advantages to doing this. Multiple referents complicate the database structure, and it is good style to have consistency in an implementation. Searching and comparisons are also more difficult if there are multiple access routes to a record.
There are two basic approaches to defining primary keys:
Existence-based identity. An artificial number attribute is added to each entity type table and made the primary key. The primary key for each relationship table consists of identifiers from the related entity types.
Existence-based identifiers have the advantage of being a single attribute, small, and uniform in size. Most relational DBMSs (RDBMSs) provide sequence numbers so that they can allocate identifiers efficiently. Existence-based identifiers can make a database more difficult to read during debugging and maintenance. IDs also complicate the merging of databases; sometimes ID values contend and must be reassigned. Artificial numbers should only be used internally in applications and not be displayed to users.
Value-based identity. Some combination of real-world attributes identifies each entity. The primary key for each relationship table consists of primary keys from the related entity types.
Value-based identity has different trade-offs. Primary keys have intrinsic meaning, making it easier to debug the database. On the downside, value-based primary keys can be difficult to change. One change may propagate to many foreign keys. Some entity types do not have natural real-world identifiers.
We recommend that you use existence-based identity for relational databases with more than 30 entity types. The uniformity and simplicity of existence-based identity outweighs any additional debugging effort. Both existence-based and value-based identity are viable options for small applications.
Entity Types
Normally, you should map each entity type to a table and each attribute to a column (see Figure 1). You can add columns for an existence-based identifier, buried relationships, and generalization discriminators (to be explained). The primary key is shown in bold font.
Figure 1 Entity type. Make each entity type a table.
Relationships
The design rules for relationship types depend on the multiplicity. Figure 2 shows the rule for many-to-many relationships. The relationship type is promoted to a table, and the primary key is the combination of the primary keys for the related entity types. The ellipses (...) denote attributes that are not shown.
Figure 2 Many-to-many relationship. Promote each one to a table.
Figure 3 shows the recommended rule for one-to-many relationships. The relationship type becomes a foreign key buried in the table for the "many" entity type.
Figure 3 One-to-many relationship. Bury each one as a foreign key in the "many" entity type table.
One-to-one relationships seldom occur. You can handle these by burying a foreign key in either entity type table.
Generalization
The simplest approach is to map the general entity type and each specific entity type to a table, as Figure 4 shows. The primary key names may vary, but an entity should have the same primary key value throughout a generalization. The discriminator (activityType) is an attribute that indicates the appropriate specific record for each general record. Applications must enforce the partition between specific entities because an RDBMS does not. For example, an RDBMS would let an Activity record have both a FlightActivity record and an OtherActivity record. An RDBMS will not stop such a multiple reference.
Figure 4 Generalization. Create separate tables for the general entity type and each specific entity type.
Indexes
The final step is to tune the database by adding indexes. Most RDBMSs create indexes as a side effect of SQL primary-key and candidate-key constraints. You should also create an index for each foreign key that is not subsumed by a primary-key or candidate-key constraint.
These indexes are important. Indexes on foreign keys and primary keys let you combine tables quickly. Without these indexes, users will become frustrated. Indexes should be an integral part of a database design because they are straightforward to include, and there is no good reason to defer them.
The database administrator (DBA) may define additional indexes for frequent queries. The DBA may also use product-specific tuning mechanisms.
Normal Forms
A normal form is a guideline for relational database design that increases data consistency. Because tables satisfy higher levels of normal forms, they are less likely to store redundant data and contradictory data. Normal forms are not fiat rules. Developers can violate them for good cause, such as to increase performance for a bottleneck or for a database that is read and seldom updated. Such a relaxation is called denormalization. The important issue with normal forms is to violate them deliberately and only when necessary.
Normal forms were first used in the 1970s and 1980s. At that time, developers designed databases by creating a list of desired fields, which they then had to organize into meaningful groups before storing them in a database. That was the purpose of normal forms.
Normal forms organize fields into groups according to dependencies between fields. For example, both the airline and account number are needed to determine the current balance of Frequent Flyer points. Unfortunately, it is easy to overlook such dependencies. If any are missed, the resulting database structure may be flawed.
Modeling provides a better way to prepare databases. Instead of focusing on the fine granularity of fields, developers think in terms of entity types, which naturally organize fields into meaningful groups. Models do not diminish the validity of normal forms—normal forms apply regardless of the development approach.
However, modeling does eliminate the need to check normal forms. If developers build a sound model, it will intrinsically satisfy normal forms. The converse also holds—a poor model is unlikely to satisfy normal forms. Furthermore, if developers cannot build a sound model, they will probably be unable to find all the dependencies that are required for checking normal forms. It is less difficult to build models than to find all the dependencies.
The bottom line is that developers can still check normal forms if they want after modeling, but such a check is unnecessary.
Summary
Figure 5 summarizes the mapping rules for RDBMSs.
Figure 5 Summary of RDMS mapping rules.