Database-Design Methods
Traditional Design Methods
In general, traditional methods of database design incorporate three phases: requirements analysis, data modeling, and normalization.
The requirements-analysis phase involves an examination of the business being modeled, interviews with users and management to assess the current system and to analyze future needs, and an assessment of information requirements for the business as a whole. This process is relatively straightforward, and, indeed, the design process presented in this book follows the same line of thinking.
The data-modeling phase involves modeling the database structure using a data-modeling method, such as entity-relationship (ER) diagramming, semantic-object modeling, object-role modeling, or UML modeling. Each of these modeling methods provides a means of visually representing various aspects of the database structure, such as the tables, table relationships, and relationship characteristics. In fact, the modeling method used in this book is a basic version of ER diagramming. Figure 2.1 shows an example of a basic ER diagram.
Each data-modeling method incorporates a set of diagramming symbols used to represent a database’s structure and characteristics. For example, the diagram in Figure 2.1 provides information on several aspects of the database.
The rectangles represent two tables called AGENTS and CLIENTS.
The diamond represents a relationship between these two tables, and the “1:N” within the diamond indicates that it is a one-to-many relationship.
The vertical line next to the AGENTS table indicates that a client must be associated with only one agent, and the circle and “crow’s foot” next to the CLIENTS table indicates that an agent doesn’t necessarily have to be associated with a client, but can be associated with one or more.
Fields are also defined and associated with the appropriate tables during the data-modeling phase. Each table is assigned a primary key, various levels of data integrity are identified and implemented, and relationships are established via foreign keys. After the initial table structures are complete and the relationships have been established according to the data model, the database is ready to go through the normalization phase.
Normalization is the process of decomposing large tables into smaller ones in order to eliminate redundant data and duplicate data and avoid problems with inserting, updating, or deleting data. During the normalization process, table structures are tested against normal forms and then modified if any of the aforementioned problems are found. A normal form is a specific set of rules that can be used to test a table structure to ensure that it is sound and free of problems. There are a number of normal forms, and each one is used to test for a particular set of problems. The normal forms currently in use are First Normal Form, Second Normal Form, Third Normal Form, Fourth Normal Form, Fifth Normal Form, Sixth Normal Form, Boyce-Codd Normal Form, and Domain/Key Normal Form.
The Design Method Presented in This Book
The design method that I use in this book is one that I’ve developed over the years. It incorporates a requirements analysis and a simple ER-diagramming method to diagram the database structure. However, it does not incorporate the traditional normalization process or involve the use of normal forms. The reason is simple: Normal forms can be confusing to anyone who has not taken the time to study formal relational database theory. For example, examine the following definition of Third Normal Form:
A relation is in 3NF if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.1
This description is relatively meaningless to a reader who is unfamiliar with the terms relation, 3NF, 2NF, non-key attribute, non-transitively dependent, and primary key.
The process of designing a database is not and should not be hard to understand. As long as the process is presented in a straightforward manner and each concept or technique is clearly explained, anyone should be able to design a database properly. For example, the following definition is derived from the results of using Third Normal Form against a table structure, and I believe most people will find it clear and easy to understand:
A table should have a field that uniquely identifies each of its records, and each field in the table should describe the subject that the table represents.
The process I used to formulate this definition is the same one I used to develop my entire design methodology.