- Views of Data
- A Brief History of Data Architecture
- Advanced Data Management·Meta-data
- Graphics·Data Modeling
- Using Entity/Relationship and Object Models
- Normalization
- Data Modeling Conventions
- Entity/Relationship Model Validation
- The Requirements Analysis Deliverable·Column One
- Data and the Other Columns
- Conclusion
The Requirements Analysis Deliverable—Column One
Entity Types and Relationships, with Narrative
A data model is a major product of the requirements analysis process. At the very least, this should be an architect's version, showing the conceptual structures of the company's data. Where appropriate, user views of the data can be included, with the linkages appropriately documented.
The data model can be an object role model, a conceptual entity/relationship diagram, or an object model showing business-oriented classes. If an entity/relationship diagram is used, it should be in either Barker or information engineering notation, to ensure readability. If a UML business class model is produced, it must be with a stripped-down version of UML. It can show only classes, associations, and sub-types. Relationship roles must be labeled in both directions.
Regardless of notations, the relationships should be named as described in this chapter, to ensure that they can be converted into conventional English. Note that each entity type name must be a common natural language term. No abbreviations, acronyms, or references to computer terms are permitted.
In the CASE tool supporting your efforts, each entity type must be described by its name and definition. Optionally, estimates of the number of occurrences, synonyms, and other information may also be included. (See Chapter 2 for a description of how this information can be used to estimate the data capacity required.)
A narrative should accompany the model: Strictly speaking, this is a collection of definitions and relationship sentences, but this structure can be camouflaged, making the narrative look like paragraphs of normal text. The idea is that anyone could read the text and understand it, without necessarily understanding the model graphics.
Also included as documentation of the model should be the business rules (captured for Column Six) that constrain it. At the basic level, this includes rules associated with the data model configuration, such as “no loops in a recursive structure”, or the requirement that if there are two paths between two entities, the occurrences at each end of the paths must be the same (or they must be different). In addition, there will be many rules which originate in the business itself.
Attributes
In the course of developing a model, the entity types and relationships are defined first, with perhaps a few attributes defined for each entity type, to clarify its meaning. By the end of the requirements project, however, all known attributes must be specified, with each defined by its name and description, format, optionality, and validation rules.
In assigning attributes, be sure to recognize that some things that appear to be attributes are really relationships to other entity types. Bob Schmidt [Schmidt, 1999] points out that what an entity type really has are predicates. A predicate is a piece of information about an entity type. It may be an attribute or a relationship to another entity type. The important thing at the beginning of the effort is to identify the predicates. Once they have all been laid out, which are attributes and which are in fact relationships to other entity types?
This is the approach also taken by object role modeling.
Ask, does this attribute itself have attributes? If so, you have a relationship to a new entity type.
In fact, you should have relatively few attributes for each entity type. You may have a surrogate key, a name or description, and possibly some cost or date data. Nearly every other attribute you may imagine is probably a relationship to another entity type.
As with entity types, each attribute must be a common natural language term. If the UML is used and attributes are shown on the model, show the attribute name only. Be sure to describe attribute characteristics behind the scenes.
Domains
It is a good practice to assign a domain to every attribute. A domain is a validation rule for an attribute. It establishes a context. It may consist of a list of values, a range, or an expression of some kind. A domain may also be used to specify a standard format. This is convenient, because once a validation rule has been defined, it can be applied easily to many attributes. It also adds discipline to the attribution process, because it requires the analyst to analyze carefully just what the attribute means.
In addition, many domains are actually specified as …TYPE entity types. While defining these as entity types/tables adds flexibility and robustness to the final system, the values should be determined up front. WORK ORDER TYPE, REPORTING RELATIONSHIP TYPE, and so forth are effectively domains, and it is the analyst's job to provide at least an initial set of values for these.
Unique Identifiers
As shown above, a data model cannot be validated without knowledge of each entity type's unique identifiers. For this reason, it is important to be sure that the attributes and relationships identifying each entity type be properly identified.
Referential Integrity
Most CASE tools do not permit this annotation, but examine the data model, and for every relationship, note on the “one” end the referential integrity rule for that relationship. Is it cascade delete, meaning that if the parent is deleted, all of that parent's children will also be deleted? Is it restricted, meaning that the parent cannot be deleted if it has children? Or is it nullify, meaning that the relationship from children to parent is optional, and deleting the parent means that the value of the relationship for each remaining child is reset to “null”?