Absolute Beginner's Guide to Database Normalization
Chapter Hightlights
-
What Does It Mean to Normalize a Database?
-
Steps to Normalize Your Data Model
-
Denormalize DataWhen Does It Make Sense to Break the Rules?
-
Normalization AppliedReview the TEB Database and Refine the Design
-
What You Have Learned
In the last chapter, you were introduced to basic database design concepts. In this chapter, you will build on those skills. If you recall one of the basic mantras of this book, it takes several iterations of work to reach an optimal database design. After your initial complement of tables has been created, the next step involves outfitting the tables with columns. At this point, the only columns that exist are the primary and foreign keys necessary to support the relations between the tables. The question at this point is, "What columns do you include in a table?" Some of the columns to include can be determined through the application of common sense. Other columns to include are not as intuitive. As you will see, whether to include a column can determine whether new tables have to be created. If you think all the tables for the Time Entry and Billing (TEB) Database have been created, you are in for a surprise! The process of determining which columns go in a table is called normalization. The normalization process is the focus of this chapter, and upon completing this chapter, the TEB Database design will be complete.
What Does It Mean to Normalize a Database?
Database normalization can best be characterized as the process of organizing a database. With the question of what out of the way, let's turn to the question of why. The goal of normalization is to reduce problems with data consistency by reducing redundancy. Sound confusing? The concept of normalization is probably easier understood by way of a simple example. To illustrate, let's again turn to the Northwind Traders Database that ships with Access.
Figure 4.1 shows the ERD (Entity Relationship Diagram) for the Northwind Traders Database that was introduced in Chapter 2, "The Anatomy of a Real Database." Notice the relationship between the Customers and Orders tables. As a quick review of how relationships work in a relational database, the primary key of the parent table is carried in the child table as the foreign key. In this case, the CustomerID field is carried in the Orders table. This is how order records for a specific customer can be associated with that customer.
Why then not carry other fields from the Customers table? As you will see later, sometimes you might elect to carry other fields from the parent table to the child table. As a general rule, however, you will not want to do this. For example, what if you decide you are going to carry the CompanyName field in the Orders table and the company name changes? Not only would you have to update the Customers table, you would also have to update the Orders table and any other table in which CompanyName exists. Continuing with this example, what if the various locations of CompanyName were not consistently updated? Which version would reflect the current, most accurate version of CompanyName? If you were asked to prepare a report that needed to include CompanyName, which table should you use as the source of CompanyName?
Figure 4.1 The Entity Relationship Diagram for the Northwind Traders Database shows a relationship between customers and orders.
It seems clear that if you have only one instance of a given data element in a database, it leaves nothing to interpretation and guesswork. When you update CompanyName in the Customers table, you can be sure that every report that relies on CompanyName will be accurate and up to date. Why? Because CompanyName is not carried redundantly in the database, and as a result, consistency is ensured. The same concept can be seen throughout the data model in Figure 4.1. Greater consistency through the elimination of redundancythis is the goal of normalization!