Requirements Analysis: Dealing with Data
- Views of Data
- A Brief History of Data Architecture
- Advanced Data Management·Meta-data
- Graphics·Data Modeling
- Using Entity/Relationship and Object Models
- Normalization
- Data Modeling Conventions
- Entity/Relationship Model Validation
- The Requirements Analysis Deliverable·Column One
- Data and the Other Columns
- Conclusion
The first column of the Architecture Framework is about data. Data are all of those numbers and letters (and now pictures and sounds) that computers manipulate. The history of our industry is one of an evolution of our attitude toward data.
This chapter begins by linking the Architecture Framework with an understanding of perspectives on data that preceded it by twelve years. It then provides a brief history of data architecture and an exposition of different modeling techniques, before elaborating on where the techniques fit in the Framework. After that is a discussion of the “normalization” process and a few words about data modeling conventions.
Views of Data
As shown in Figure 3.1, the Architecture Framework shows the six different views of data: Planner's (scope) view, business owner's view, architect's view, designer's view, builder's view, and production view. As it happens, the second through the fifth views map directly to views of data that have been recognized since long before John Zachman published his Framework. In 1975, a committee in the Computer and Business Equipment Manufactures Association (commonly known as ANSI/SPARC) identified and defined a complete set of schemas characterizing the structure of data. A draft of this report was published and the 42 schemas were short-handed down to just three. This then became known as “Three Schema Architecture” [Tsichritzis and Klug, 1978].
Figure 3.1. The Architecture Framework—Data.
The business owner's view is not one but many ways of looking at a body of data—each one by a particular user or provider of the data. In each case, the data are organized according to terms appropriate for the job being done by the person. Each is shown in Figure 3.2 as an external schema. Note that each is different, but of course they may overlap, often using the same terms of reference, although even these terms may be defined differently.
Figure 3.2. Three Schema Architecture.
Note that the users whose views are reflected here may be either creators or consumers of data or both.
The information architect's view combines these different external views into a single, coherent definition of the enterprise's data. In this view, each data element is defined only once for the organization, and its relationship to all other data elements is clearly defined as well. Each external schema may consist of a selection of these elements, but the underlying definitions are consistent across them all. This unified version of the company's data is called the conceptual schema. A conceptual schema is an organization of data where each datum is defined only once for the organization, and it relationship to all other data are clearly and uniquely defined as well. Each external schema may consist of a selection of these data, but the underlying definitions from the conceptual schema apply to all external schemata as well, and they are consistent across them all. The architect's view of data is of the conceptual schema. This is often maintained by someone with the title, Data Administrator or, more recently, Data Architect.
An internal schema is an organization of data according to the technology being used to record it. This includes—for a particular database management system—the external terms of reference (“tables”, “segments”, “object classes”, etc.) and the internal components (“tablespaces”, etc.). It also includes terms for the physical storage of data on the computer (“cylinder”, “track”, etc.)
Note that the internal schema combines the designer's view with the builder's view. That is, it covers the full range of things required to convert a conceptual schema into a database. This includes selecting a particular database management system, modifying the model to accommodate problems of system performance, and determining the actual layout of files on a disk.
The organization of the conceptual schema is intentionally independent of both how people see the data and how the data are arranged in a particular database management system. More than that, the conceptual schema certainly does not reflect how the data are physically arranged on a storage device. The database management system and physical perspectives represent additional views of the data.
This independence means that the external schemata (business owners' views) can change, without changing the conceptual schema (information architect's view). The physical designer and builder can rearrange the data in a database or on a disk, without affecting the conceptual or external schemata.
In the past, one of the elements of the internal schema was called the logical schema, which originally simply represented the conceptual schema in terms of a particular kind of database management system. In more recent years, however, the term “logical schema” has been used interchangeably with “conceptual schema”, thereby generating considerable confusion. This book will be interested only in the conceptual schema and its derivation from a set of external schemata. Since it is concerned with technology, the logical schema (in the original sense of the word) is the domain of the designer.