Working with Multiple-Table Queries
In this chapter
-
Relational Database Fundamentals
-
Types of Relational Models
-
Enforcing Referential Integrity
-
Establishing Table Relationships
-
Working with Multiple Tables in a Query
-
Creating Other Types of Joins
-
Creating a Unique Values Query
-
Case Study
Most database applications (and all well-designed database applications) store their information in multiple tables. Although most of these tables have nothing to do with each other (for example, tables of customer information and employee payroll data), it's likely that at least some of the tables do contain related information (such as tables of customer information and customer orders).
Working with multiple, related tables in a query presents you with two challenges: You need to design your database so that the related data is accessible, and you need to set up links between the tables so that the related information can be retrieved and worked with quickly and easily in the query design window. This chapter tackles both challenges and shows you how to exploit the full multiple-table powers of Access.
Relational Database Fundamentals
Why do you need to worry about multiple tables, anyway? Isn't it easier to work with one large table instead of two or three medium-sized ones? To answer these questions and demonstrate the problems that arise when you ignore relational database models, take a look at a simple example: a table of sales leads.
The Pitfalls of a Nonrelational Design
Table 3.1 outlines a structure of a simple table (named Leads) that stores data on sales leads.
Table 3.1 A Structure of a Simple Sales Leads Table (Leads)
Field |
Description |
LeadID |
The primary key. |
FirstName |
The contact's first name. |
LastName |
The contact's last name. |
Company |
The company that the contact works for. |
Address |
The company's address. |
City |
The company's city. |
State |
The company's state. |
Zip |
The company's ZIP code. |
Phone |
The contact's phone number. |
Fax |
The contact's fax number. |
Source |
Where the lead came from. |
Notes |
Notes or comments related to the sales lead. |
This structure works fine until you need to add two or more leads from the same company (a not-uncommon occurrence). In this case, you end up with repeating information in the Company, Address, City, and State fields. (The Zip field also repeats, as do, in some cases, the Phone, Fax, and Source fields.)
All this repetition makes the table unnecessarily large, which is bad enough, but it also creates two major problems:
During data entry, the repeated information must be entered for each lead from the same company.
If any of the repeated information changes (such as the company's name or address), each corresponding record must be changed.
One way to eliminate the repetition and solve the data entry and maintenance inefficiencies is to change the table's focus. As it stands, each record in the table identifies a specific contact in a company. But it's the company information that repeats, so it makes some sense to allow only one record per company. You can then include separate fields for each sales lead within the company. The new structure might look something like the one shown in Table 3.2.
Table 3.2 A Revised, Company-Centered Structure of the Sales Leads Table
Field |
Description |
LeadID |
The primary key. |
Company |
The company's name. |
Address |
The company's address. |
City |
The company's city. |
State |
The company's state. |
Zip |
The company's ZIP code. |
Phone |
The company's phone number. |
Fax |
The company's fax number. |
First_1 |
The first name of contact #1. |
Last_1 |
The last name of contact #1. |
Source_1 |
Where the lead for contact #1 came from. |
Notes_1 |
Notes or comments related to contact #1. |
First_2 |
The first name of contact #2. |
Last_2 |
The last name of contact #2. |
Source_2 |
Where the lead for contact #2 came from. |
Notes_2 |
Notes or comments related to contact #2. |
First_3 |
The first name of contact #3. |
Last_3 |
The last name of contact #3. |
Source_3 |
Where the lead for contact #3 came from. |
Notes_3 |
Notes or comments related to contact #3. |
In this setup, the company information appears only once, and the contact-specific data (I'm assuming this involves only the first name, last name, source, and notes) appears in separate field groups (for example, First_1, Last_1, Source_1, and Notes_1). This solves the earlier problems, but at the cost of a new dilemma: The structure as it stands will hold only three sales leads per company. Of course, it's entirely conceivable that a large firm might have more than three contactsperhaps even dozens. This raises two unpleasant difficulties:
If you run out of repeating groups of contact fields, new ones must be added. Although this might not be a problem for the database designer, most data-entry clerks generally don't have access to the table design (nor should they).
Empty fields take up as much disk real estate as full ones, so making room for, say, a dozen contacts from one company means that all the records that have only one or two contacts have huge amounts of wasted space.
How a Relational Design Can Help
To solve the twin problems of repetition between records and repeated field groups within records, you need to turn to the relational database model. This model was developed by Dr. Edgar Codd of IBM in the early 1970s. It was based on a complex relational algebra theory, so the pure form of the rules and requirements for a true relational database setup is quite complicated and decidedly impractical for business applications. The next few sections look at a simplified version of the model.
Step 1: Separate the Data
After you know which fields you need to include in your database application, the first step in setting up a relational database is to divide these fields into separate tables where the "theme" of each table is unique. In technical terms, each table must be composed of only entities (that is, records) from a single entity class.
For example, the table of sales leads you saw earlier dealt with data that had two entity classes: the contacts and the companies they worked for. Every one of the problems encountered with that table can be traced to the fact that we were trying to combine two entity classes into a single table. So the first step toward a relational solution is to create separate tables for each class of data. Table 3.3 shows the table structure of the contact data (the Contacts table) and Table 3.4 shows the structure of the company information (the Companies table). Note, in particular, that both tables include a primary key field.
Table 3.3 The Structure of the Contacts Table
Field |
Description |
ContactID |
The primary key. |
FirstName |
The contact's first name. |
LastName |
The contact's last name. |
Phone |
The contact's phone number. |
Fax |
The contact's fax number. |
Source |
Where the lead came from. |
Notes |
Notes or comments related to the sales lead. |
Table 3.4 The Structure of the Companies Table
Field |
Description |
CompanyID |
The primary key. |
CompanyName |
The company's name. |
Address |
The company's address. |
City |
The company's city. |
State |
The company's state. |
Zip |
The company's ZIP code. |
Phone |
The company's phone number (main switchboard). |
Step 2: Add Foreign Keys to the Tables
At first glance, separating the tables seems self-defeating because, if you've done the job properly, the two tables will have nothing in common. So the second step in this relational design is to define the commonality between the tables.
In the sales leads example, what is the common ground between the Contacts and Companies tables? It's that every one of the leads in the Contacts table works for a specific firm in the Companies table. So what's needed is some way of relating the appropriate information in Companies to each record in Contacts (without, of course, the inefficiency of simply cramming all the data into a single table, as we tried earlier).
The way you do this in relational database design is to establish a field that is common to both tables. You can then use this common field to set up a link between the two tables. The field you use must satisfy three conditions:
It must not have the same name as an existing field in the other table.
It must uniquely identify each record in the other table.
To save space and reduce data entry errors, it must be the smallest field that satisfies the two preceding conditions.
In the sales leads example, a field needs to be added to the Contacts table that establishes a link to the appropriate record in the Companies table. The CompanyName field uniquely identifies each firm, but it's too large to be of use. The Phone field is also a unique identifier and is smaller, but the Contacts table already has a Phone field. The best solution is to use CompanyID, the Companies table's primary key field. Table 3.5 shows the revised structure of the Contacts table that includes the CompanyID field.
Table 3.5 The Final Structure of the Contacts Table
Field |
Description |
ContactID |
The primary key. |
CompanyID |
The Companies table foreign key. |
FirstName |
The contact's first name. |
LastName |
The contact's last name. |
Phone |
The contact's phone number. |
Fax |
The contact's fax number. |
Source |
Where the lead came from. |
Notes |
Notes or comments related to the sales lead. |
When a table includes a primary key field from a related database, the field is called a foreign key. Foreign keys are the secret to successful relational database design.
Step 3: Establish a Link Between the Related Tables
After you have your foreign keys inserted into your tables, the final step in designing your relational model is to establish a link between the two tables. This step is covered in detail later in this chapter (see "Establishing Table Relationships").