Enterprise Architecture
Creating and maintaining an enterprise architecture (EA) is a popular method for controlling data redundancies as well as process redundancies, and thereby reducing the anomalies and inconsistencies that are inherently produced by uncontrolled redundancies. EA is comprised of models that describe an organization in terms of its business architecture (business functions, business processes, business data, and so on) and technical architecture (applications, databases, and so on). The purpose of these models is to describe the actual business in which the organization engages. EA is applicable to all organizations, large and small. Because EA models are best built incrementally, one project at a time, it is appropriate to develop EA models on DW and BI projects, as well as on projects that simply solve departmental challenges.
EA includes at least five models, with the business data model and metadata repository being the two most important components for data quality.
- Business Function model—This model shows the hierarchy of business functions of an organization. In other words, it shows what the organization does. This model is used for organizing or reorganizing the company into its lines of business.
- Business Process model—This model shows the business processes being performed for the business functions. In other words, it shows how the organization performs its business functions. This model is used for business process reengineering and business process improvement initiatives.
Business Data model—This model is the enterprise logical data model, also known as enterprise information architecture, that shows what data supports the business functions and business processes. This model contains:
- Business objects (data entities)
- Business activities involving these entities (data relationships)
- Data stored about these entities (attributes)
- Rules governing these entities and their attributes (metadata)
In the real world, business objects and data about those objects are intrinsically unique. Therefore, they appear as entities and attributes once and only once on a business data model, regardless of how many times they are redundantly stored in physical files and databases. There should be only one business data model for an organization showing the "single version of the truth" or the "360-degree view" of the organization.
- Application inventory—The application inventory is a description of the physical implementation objects that support the organization such as applications (programs and scripts), databases, and other technical components. It shows where the architectural pieces reside in the technical architecture. You should always catalog and document your systems because such inventories are crucial for performing impact analysis.
- Metadata repository—Models have to be supported by descriptive information, which is called metadata. Metadata is an essential tool for standardizing data, for managing and enforcing the data standards, and for reducing the amount of rework performed by developers or users who are not aware of what already exists and therefore do not reuse any architectural components.
Data Quality Improvement Process
In addition to applying enterprise-wide data quality disciplines, creating an enterprise data model, and documenting metadata, the data quality group should develop their own data quality improvement process. At the highest level, this process must address the six major components shown in Figure 3.2. These components are:
Assess—Every improvement cycle starts with an assessment. This can either be an initial enterprise-wide data quality assessment, a system-by-system data quality assessment, or a department-by-department data quality assessment. When performing the assessment, do not limit your efforts to profiling the data and collecting statistics on data defects. Analyze the entire data entry or data manipulation process to find the root causes of errors and to find process improvement opportunities.
Another type of assessment is a periodic data audit. This type of assessment is usually limited to one file or one database at a time. It involves data profiling as well as manual validation of data values against the documented data domains (valid data values). These domains should have already been documented as metadata, but if not, they can be found in programs, code translation books, online help screens, spreadsheets, and other documents. In the worst case, they be discovered by asking subject matter experts.
Figure 3.2 Data Quality Improvement Cycle
- Plan—After opportunities for improvement have been defined, the improvements should be analyzed, prioritized, approved, funded, staffed, and scheduled. Not all improvements have the same payback and not all improvements are practical or even feasible. An impact analysis should determine which improvements have the most far-reaching benefits. After improvement projects have been prioritized, approved, and funded, they should be staffed and scheduled.
- Implement—In some cases, the data quality group can implement the approved improvements, but in many cases, other staff members from both the business side and IT will be required. For example, a decision might have been made that an overloaded column (a column containing data values describing multiple attributes) should be separated in a database. That would involve the business people who are currently accessing the database, the database administrators who are maintaining it, and the developers whose programs are accessing it.
- Evaluate—The best ideas sometimes backfire. Although some impact analysis will have been performed during planning, occasionally an adverse impact will be overlooked. Or worse, the implemented improvement might have inadvertently created a new problem. It is therefore advisable to monitor the implemented improvements and evaluate their effectiveness. If deemed necessary, an improvement can be reversed.
- Adapt—Hopefully, most improvements do not have to be reversed, but some may have to be modified before announcing them to the entire organization or before turning them into new standards, guidelines, or procedures.
- Educate—The final step is to disseminate information about the new improvement process just implemented. Depending on the scope of the change, education can be accomplished through classroom training, computer-based training, an announcement on the organization’s intranet website, an internal newsletter, or simple e-mail notification.