- Views of Data
- A Brief History of Data Architecture
- Advanced Data Management·Meta-data
- Graphics·Data Modeling
- Using Entity/Relationship and Object Models
- Normalization
- Data Modeling Conventions
- Entity/Relationship Model Validation
- The Requirements Analysis Deliverable·Column One
- Data and the Other Columns
- Conclusion
Entity/Relationship Model Validation
When delivering a data model, it is important to validate it. This happens from two directions. First, someone should proofread it to be sure that it follows the modeling rules and conventions. This means first making sure that the syntax is correct, and then making sure that the positional conventions were followed. Then it means making sure that industry standard conventions for representing standard situations were at least consulted. There should be an explicit reason for each departure from the semantic standard.
Second, it should be reviewed with subject matter experts to ensure that the assertions made in the model are in fact true in the business.
Finally, it should be tested for intellectual rigor. Is it consistent and coherent? This is trickier to do.
John Sharp has in fact devised a way to validate the coherence and correctness of a model systematically, using sample data and focusing on the use of unique identifiers. He has presented it in numerous articles for the Journal of Conceptual Modeling (on the World Wide Web at: http://www.inconcept.com/JCM).This involves setting up a table of attributes and populating it with sample data, varying one attribute at a time.
In an example from a course that he teaches, Figure 3.39 shows the Barker version of a model presented in the Federal IDEF1X standard [FIPS, 1993]. Some of the entity types have been slightly renamed for clarity. Dr. Sharp uses this to demonstrate his technique.
Figure 3.39. Validation Exercise—1.
Note that the unique identifiers for each entity type are shown in the diagram as a combination of attributes (denoted by the octothorpe symbol, #) and relationships (denoted by a line across the relationship). Table 3.17 shows the identifiers of each entity type.
Table 3.17. Unique Identifiers
Entity type |
Attribute |
Relationship |
---|---|---|
PERSON |
“ID” |
|
SEMESTER |
“ID” |
|
COURSE |
“Course Number” |
|
COURSE OFFERING |
during SEMESTER |
|
occurrence of COURSE |
||
CLASS SECTION |
“Section Number” |
|
of COURSE OFFERING |
||
CLASS ENROLLMENT |
of PERSON |
|
in CLASS SECTION |
Dr. Sharp's technique begins by expressing a sentence that represents an occurrence of the entity type. In this example, such a sentence might be “Sally (PERSON 234) received an A in Section 2 of Geology 101 (GE101) during the Fall semester of 1999 (F99).” This is represented by the first row of Table 3.18.
Table 3.18. Validation Table
Semester ID |
Course ID |
Section Number |
Person ID |
Grade |
Expert |
Model |
---|---|---|---|---|---|---|
F99 |
GE101 |
2 |
234 |
A |
||
(another) |
GE101 |
2 |
234 |
A |
Y |
Y |
F99 |
(another) |
2 |
234 |
A |
Y |
Y |
F99 |
GE101 |
(another) |
234 |
A |
N |
Y |
F99 |
GE101 |
2 |
(another) |
A |
Y |
Y |
F99 |
GE101 |
2 |
234 |
(another) |
N |
N |
The relation is arranged so that the attributes represent the predicates of a particular entity type. (Predicate, recall, means either an attribute or a relationship to another entity type.)
The rows are arranged to ask, in each case, if a particular predicate can have any other value for the predicate than the one used in the base sentence. For example, in Row Two, is it possible that Sally (PERSON 234) could receive an A in Section 2 of Geology 101 (GE101) during another semester?
At the right are two columns, indicating in one case the answer by an expert in the area under discussion, and in the other case the answer dictated by the model. If the answers don't match, the model must be wrong.
In the model, since “Semester ID” is part of the unique identifier of COURSE OFFERING, A COURSE OFFERING, can be for as many SEMESTERS as we want. The expert also says that, at least under current rules, a PERSON can take the same SECTION of the same COURSE more than once. (Note that the rules may change.) Similarly, the model and the expert agree, as they do in all the rows but one—the fourth.
In the fourth row, the question is, can Sally (PERSON 234) receive an A in another section of Geology 101 (GE101) during the Fall semester of 1999? It is highly unlikely that she is taking two sections at once, but this is not prevented by the model. In this case, the model says that, since SECTION is part of the identifier of “grade” (in GRADED ENROLLMENT), then it is possible to take as many sections of a course as you want, and there is nothing to prevent the sections from being in the same semester. However, the expert points out that this is not possible. You can take only one section of a course at a time. Hence, the model is wrong, or it is at least missing a business rule.
The corrected model is shown in Figure 3.40. Note that CLASS ENROLLMENT is now identified only by COURSE OFFERING, not by CLASS SECTION. It is still a fact that the CLASS ENROLLMENT is in a particular CLASS SECTION, but now a given occurrence of CLASS ENROLLMENT in a COURSE OFFERING may be in only one CLASS SECTION. Note that a business rule must be still be specified: Since each CLASS ENROLLMENT must be in a CLASS SECTION that must be of a COURSE OFFERING, and each CLASS ENROLLMENT must be in a COURSE OFFERING directly, the CLASS OFFERING for a single CLASS ENROLLMENT must be the same for both paths.
Figure 3.40. Validation Exercise—2.
This validation procedure is a very important way to verify that your model says exactly what you think it does. In addition, it identifies business rules that may not be able to be portrayed in the model at all. Note that this validation cannot be done with an object model, since object models do not specify which attributes and relationships are unique identifiers.