- The Business Case for a New Design Process
- Improving the Development Process
- Overview of Data Integration Modeling
- Conceptual Data Integration Models
- Logical Data Integration Models
- Physical Data Integration Models
- Tools for Developing Data Integration Models
- Industry-Based Data Integration Models
- Summary
- End-of-Chapter Questions
Logical Data Integration Models
A logical data integration model produces a set of detailed representations of the data integration requirements that captures the first-cut source mappings, business rules, and target data sets (table/file). These models portray the logical extract, data quality, transform, and load requirements for the intended data integration application. These models are still considered to be technology-independent. The following sections discuss the various logical data integration models.
High-Level Logical Data Integration Model
A high-level logical data integration model defines the scope and the boundaries for the project and the system, usually derived and augmented from the conceptual data integration model. A high-level data integration diagram provides the same guidelines as a context diagram does for a data flow diagram.
The high-level logical data integration model in Figure 3.6 provides the structure for what will be needed for the data integration system, as well as provides the outline for the logical models, such as extract, data quality, transform, and load components.
Figure 3.6 Logical high-level data integration model example
Logical Extraction Data Integration Models
The logical extraction data integration model determines what subject areas will need to be extracted from sources, such as what applications, databases, flat files, and unstructured sources.
Source file formats should be mapped to the attribute/column/field level. Once extracted, source data files should be loaded by default to the initial staging area.
Figure 3.7 depicts a logical extraction model.
Figure 3.7 Logical extraction data integration model example
Extract data integration models consist of two discrete sub processes or components:
- Getting the data out of the source system—Whether the data is actually extracted from the source system or captured from a message queue or flat file, the network connectivity to the source must be determined, the number of tables\files must be reviewed, and the files to extract and in what order to extract them in must be determined.
- Formatting the data to a subject area file—As discussed in Chapter 2, "An Architecture for Data Integration," subject area files provide a layer of encapsulation from the source to the final target area. The second major component of an extract data integration model is to rationalize the data from the source format to a common subject area file format, for example mapping a set of Siebel Customer Relationship Management Software tables to a customer subject area file.
Logical Data Quality Data Integration Models
The logical data quality data integration model contains the business and technical data quality checkpoints for the intended data integration process, as demonstrated in Figure 3.8.
Figure 3.8 Logical data quality data integration model example
Regardless of the technical or business data quality requirements, each data quality data integration model should contain the ability to produce a clean file, reject file, and reject report that would be instantiated in a selected data integration technology.
Also the error handling for the entire data integration process should be designed as a reusable component.
As discussed in the data quality architectural process in Chapter 2, a clear data quality process will produce a clean file, reject file, and reject report. Based on an organization's data governance procedures, the reject file can be leveraged for manual or automatic reprocessing.
Logical Transform Data Integration Models
The logical transform data integration model identifies at a logical level what transformations (in terms of calculations, splits, processing, and enrichment) are needed to be performed on the extracted data to meet the business intelligence requirements in terms of aggregation, calculation, and structure, which is demonstrated in Figure 3.9.
Figure 3.9 Logical transformation data integration model example
Transform types as defined in the transformation processes are determined on the business requirements for conforming, calculating, and aggregating data into enterprise information, as discussed in the transformation architectural process in Chapter 2.
Logical Load Data Integration Models
Logical load data integration models determine at a logical level what is needed to load the transformed and cleansed data into the target data repositories by subject area, which is portrayed in Figure 3.10.
Figure 3.10 Logical load data integration model example
Designing load processes by target and the subject areas within the defined target databases allows sub-processes to be defined, which further encapsulates changes in the target from source data, preventing significant maintenance. An example is when changes to the physical database schema occur, only the subject area load job needs to change, with little impact to the extract and transform processes.