3.2 Anatomy of Business Intelligence
Examining your environment, understanding what is going on around you—this is what BI is all about. As we discussed in Chapter 2, the very fabric of our environment has undergone a massive transformation. The threads of this fabric have gone from person-to-person interactions to person-to-system interactions and in many cases to system-to-system interactions. The catalyst for this transformation is the Internet. The Internet has changed the way organizations interact with one another as well as with their customers. An organization's BI must evolve to meet the challenges of this new environment.
Figure 3.1 presents the BI loop described in Chapter 1. As we look at this figure, we see the components of BI. Each component has the potential to be changed in significant ways by the Internet. Data from outside the data warehouse flows into the BI loop through the operational environment. This data contains information about customers, suppliers, competitors, products, and the organization itself. The Internet expands the information sources of the warehouse. It reaches beyond what is contained within the organization's internal systems, across the Internet, to include partner, supplier, and customer systems. It spans the entire breadth of the value chain.
Figure 3.1. Business Intelligence loop.
As we extract the data from the operational environment, we cleanse and transform it to make it more consistent with the data in our warehouse. It is then stored in some central repository. This central repository can be either a multidimensional or relational database. The extraction, cleansing, transformation, and storage of data is the data warehouse/data mart portion of the BI loop. We will discuss the data warehouse in more detail in section 3.2.1.
DSS is the next step in the loop. DSS retrieves data and presents it to the decision maker. We often think of DSS as a multidimensional tool that is a complex, advanced system. At times, it is. At other times, we can consider simple reporting as DSS. In fact, DSS is a full spectrum of systems, ranging from reporting through OnLine Analytical Processing (OLAP) to data mining. As we discuss IEBI, the tremendous impact the Internet has had on DSS will become clear. We will see that Java provides support for developing Web-enabled DSS and XML, a common way to share data to devices with drastically different capabilities; Common Warehouse Metadata Interchange (CWMI) provides a means to share metadata between systems. In later chapters, we discuss how DSS can use the capabilities of the Internet to deliver support to decision makers throughout the entire organization. We will examine DSS in more detail in section 3.2.4.
Data warehousing and DSS are just a means to an end; they might even be considered implementation details. BI is the process and systems used by the organization to define its strategic direction. As such, BI also includes applications that are often overlooked by the novice. Two such applications are the Balanced Scorecard and Activity Based Costing (ABC). What is most interesting about these applications is that they did not originate within IT departments but within the business community. We will examine each of these applications in section 3.2.5.
Integral to this loop is the decision maker. He or she takes information extracted from the data warehouse and delivered by the DSS to define some plan of action. This plan is not necessarily a change in course; the data may support maintaining the present direction. Rather than considering the decision maker as the end user of BI, we should consider him or her as part of the process. The operational environment reflects the result of his or her decisions. These results are then fed back into the data warehouse and another iteration of the loop begins again. Decision makers are just as much a part of the BI loop as any other component. We will examine the role of the decision maker as well as the different types of decision makers in section 3.2.6.
As we discussed in Chapter 1, the loop is composed of three basic steps: acquire the data, analyze the data, and take action based on the data. We can describe this data as the three A's: acquire, analyze, and act.
3.2.1 THE DATA WAREHOUSE2
The previous section discussed the unique purpose of BI. This section describes how the data warehouse, the heart of the BI loop, meets these special needs. In this section we define a data warehouse and identify each of its components. We already know that the warehouse sits outside the operational environment and receives its data from it. We also know that the purpose of the warehouse is to provide a central repository for strategic information that will be used as a basis for business strategy.
Given this understanding of the data warehouse, we see it is a system sitting apart from the operational environment that feeds off of it. What is it, though? The very term data warehouse evokes images of large buildings of corrugated metal where dingy yellow forklifts loaded down with crates of information scurry between bare steel girders. To the computer literate, there is some nebulous vision of big computers with petabytes of disk drive space, seasoned with a dash of some sort of online archive of operational data. In this idealized world, users magically fly through volume upon volume of information, routing out that one piece of information that is going to make all the difference.
The situation is similar to the old story of the blind men and the elephant. One blind man felt the legs and thought the elephant was like a tree, while another felt the trunk and thought it was like a snake. Finally, a third felt the tail and thought it was like a rope. Each view grasps some element of the truth while missing the overall picture. It is true that data moves data from the operational environment to the warehouse, but the warehouse is more than an archive. It is also true that the warehouse contains large volumes of data, but the central repository where the data is stored is only part of the overall warehouse. The key to understanding the data warehouse is how the parts interact with one another, a Gestalt, if you will, of the data warehouse. The data warehouse is clearly a case where the whole is greater than the sum of the parts. Figure 3.2 presents each of these parts and their place in the warehouse.
Figure 3.2. Data warehouse.
Let's follow the path data takes from the operational environment to the decision maker:
-
Operational Environment— The operational environment runs the day-to-day activities of the organization. Such systems as order entry, accounts payable, and accounts receivable reside within the operational environment. These systems collectively contain the raw data that describes the current state of the organization.
-
Independent Data Mart— A common misconception is that a data mart is a small data warehouse. The difference between the data mart and the data warehouse is scope. The data mart focuses on an individual subject area within the organization, where the scope of the data warehouse is the entire organization. An independent data mart receives data from external sources and the operational environment independent of any data warehouse.
-
Extraction— The extraction engine receives data from the operational environment. The extraction process can occur in a variety of ways. The warehouse can be the passive recipient of data, where the operational environment passes the data to the warehouse, or it may actively retrieve data from the operational environment. Transportable tables and data replication are examples of alternative techniques for moving data into the warehouse.
-
Extraction Store— Data received from the operational environment must be scrubbed before it is incorporated into the data warehouse. The extraction store holds the extracted data while it is awaiting transformation and cleansing. It is like the Ellis Island of data warehousing.
-
Transformation and Cleansing— Scrubbing consists of data transformation and cleansing. Data transformation is the process of converting data from different systems and different formats into one consistent format. Cleansing is the process of removing errors from the data.
-
Extraction Log— As the operational data is integrated into the data warehouse, an extraction log is maintained to record the status of the extraction process. This log is actually part of the data warehouse's metadata and is critical in maintaining data quality. This log will serve as input to the data administrator to verify the quality of the data integrated into the warehouse.
-
External Source— Data originating from outside the organization is also included in the data warehouse. This external data could include such information as stock market reports, interest rates, and other economic information. An external source could also provide metadata such as Standard Industrial Classification (SIC) codes.
-
Data Administrator— The role of the data administrator is to ensure the quality of the data in the warehouse. This role should not be confused with the database administrator. The database administrator is responsible for the operation of the system that supports the data warehouse. The data administrator is the team member responsible for the quality of the data within the warehouse. One of the responsibilities of the data administrator is to review the extraction log for changes in metadata, inaccurate data from the operational environment, or even data errors generated by the operational system. The data administrator will take the necessary corrective actions, such as making changes to the metadata repository, correcting erroneous data, or notifying operations of programming errors.
-
Central Repository— The central repository is the cornerstone of the data warehouse architecture. This central location stores all the data and metadata for the data warehouse.
-
Metadata— This is data about data. One way to describe metadata is that it provides the context of the data. It describes what kind of information is stored, where it is stored, how it is encoded, how it relates to other information, where it comes from, and how it is related to the business. Metadata also contains the business rules of the data—the use of the data within the organization.
-
Data— The data store contains the raw data of the data warehouse. The central data store can be either a multidimensional database or a relational database system. The structure of this data and how this structure is designed is the focus of this book.
-
Dependent Data Mart— A common misconception is that a data mart is a small data warehouse. The difference between the data mart and the data warehouse is scope. The data mart focuses on a specific subject area within the organization; the scope of the data warehouse is the entire organization. A dependent data mart relies on the data warehouse for the source of its data.
This description of the warehouse, however, examines only the parts. It does not provide us with a complete picture of the warehouse. Let's take a moment to look at the data warehouse as a complete entity. Almost everything written on data warehousing begins with the obligatory and often verbose comparison between the data warehouse and the transaction-oriented operational world. Despite what Emerson may have said concerning consistency, that it is the hobgoblin of little minds, a study contrasting these two environments clarifies some of the most important characteristics of the data warehouse.
W. H. Inmon defines the data warehouse as “a subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions.”3 I find this definition of data warehousing to be most clear in that it highlights the most vital features of the warehouse. In the next few subsections we will discuss each of these characteristics and how they differ from the operational sources.
3.2.1.1 Subject-Oriented
The first characteristic of a data warehouse, as described by Inmon, is subject orientation. The operational environment focuses its attention on the day to day transactions that are part of the normal operation of the business. The data warehouse is concerned with the things in the business environment that are driving those transactions. Figure 3.3 shows the difference in data orientation. This difference has far-reaching effects on the entire system.
Figure 3.3. Data orientation.
The transaction-oriented system structures data in a way that optimizes the processing of transactions. These systems typically deal with many users accessing a few records at a time. For reasons too numerous to discuss here, minimizing record and table size improves overall system performance. System architects normalize transaction databases to structure the database in an optimal way. Although a complete discussion of normalization is outside the scope of this book, it is sufficient to note that data pertaining to a specific subject is distributed across multiple tables within the database. For example, an employee works in a department that is part of a division. The employee, department, and division information is all related to the subject employee. Yet, that data will be stored in separate tables.
Operational data is distributed across multiple applications as well as tables within an application. A particular subject may be involved in different types of transactions. A customer appearing in the accounts receivable system may also be a supplier appearing in the accounts payable system. Each system has only part of the customer data. We are back to the blind men and the elephant. Nowhere is there a single consolidated view of the one organization.
Considering the way in which the decision maker uses the data, this structure is very cumbersome. First, the decision maker is interested in the behavior of business subjects. To get a complete picture of any one subject, the strategist would have to access many tables within many applications. The problem is even more complex. The strategist is not interested in one occurrence of a subject or an individual customer, but in all occurrences of a subject and all customers. As one can easily see, retrieving this data in real time from many disparate systems would be impractical.
The warehouse, therefore, gathers all of this data into one place. The structure of the data is such that all the data for a particular subject is contained within one table. In this way, the strategist can retrieve all the data pertaining to a particular subject from one location within the data warehouse. This greatly facilitates the analysis process, as we shall see later. The task of associating subjects with actions to determine behaviors is much simpler.
3.2.1.2 Integration
The difference in the orientation of the data drives the need to gather the data into one place. The warehouse, however, does more than gather the data. In a sense, it derives its data from the operational environment. The operational data is the basis of the warehouse. The integration process (see Figure 3.4) forms it into a single cohesive environment. The origin of the data is invisible to the decision maker in this environment. The integration process consists of two tasks: data cleansing and data transformation.
Figure 3.4. Data integration.
Data Cleansing
Data cleansing is the process of removing errors from the input stream and is part of the integration process. It is perhaps one of the most critical steps in the data warehouse. If the cleansing process is faulty, the best thing that could happen is that the decision maker will not trust the data and the warehouse will fail. If that's the best thing, what could be worse? The worst thing is that the warehouse could provide bad information and the strategist could trust it. This could mean the development of a corporate strategy that fails. The stakes are indeed high.
A good cleansing process, however, can improve the quality of not only the data within the warehouse, but the operational environment as well. The extraction log records errors detected in the data cleansing process. The data administrator in turn examines this log to determine the source of the errors. At times, the data administrator will detect errors that originated in the operational environment. Some of these errors could be due to a problem with the application or something as simple as incorrect data entry. In either case, the data administrator should report these errors to those responsible for operational data quality. Some errors will be due to problems with the metadata. Perhaps the cleansing process did not receive a change to the metadata. Perhaps the metadata for the cleansing process was incorrect or incomplete. The data administrator must determine the source of this error and take corrective action. In this way, the data warehouse can be seen as improving the quality of the data throughout the entire organization.
There is some debate as to the appropriate action for the cleansing process to take when errors are detected in the input data stream. Some purists feel the warehouse should not incorporate records with errors. The errors in this case should be reported to the operational environment, where they will be corrected and then resubmitted to the warehouse. Others feel that the records should be corrected whenever possible and incorporated into the warehouse. Errors are still reported to the operational environment, but it is the responsibility of those maintaining the operational systems to take corrective action. The concern is making sure that the data in the warehouse reflects what is seen in the operational environment. A disagreement between the two environments could lead to a lack of confidence in the warehouse.
The cleansing process cannot detect all errors. Some errors are simple and honest typographical mistakes. There are errors in the data that are more nefarious and will challenge the data administrator. For example, one system required the entry of the client's SIC code for every transaction. The sales representatives did not really care and found two or three codes that would be acceptable to the system. They entered these standby codes into the transaction system whenever the correct code was not readily available. These codes were then loaded into the data warehouse during the extraction. While there are many tools available on the market to assist in cleansing the data as it comes into the warehouse, errors such as these make it clear that no software product can get them all.
Data cleansing is the child of the data administrator. This is an essential position on the data warehouse team. The data administrator must take a proactive role in routing out errors in the data. While there is no one component that will guarantee the success of a data warehouse, there are some that will ensure its failure. A poor cleansing process or a torpid data administrator is definitely a key to failure.
Data Transformation
Rarely does one encounter operational environments where data is consistent between applications. In a world of turnkey systems and best-of-breed heterogeneous environments, it would be more surprising to see data consistency than not. Data transformation addresses this issue. The data transformation process receives the input streams from the different operational systems and transforms them into one consistent format.
The sheer task of defining the inconsistencies between operational systems can be enormous. Table 3.1 demonstrates the different types of integration challenges facing the data warehouse architect. The table shows that as each new source of data is identified, the complexity of the integration process increases. An analysis of each system contributing to the data warehouse must be performed to understand both the data elements that are of interest and the format of these elements. Once these elements have been selected and defined, an integration process must be defined that will provide consistent data.
Table 3.1. Integration Issues
Sales Voucher |
Purchase Order |
Inventory |
|
---|---|---|---|
Description |
Customer Name |
Customer Name |
Customer Name |
|
|
International Business Machines |
|
Encoding |
Sex
|
Sex
|
Sex
|
Units |
Cable Length |
Cable Length |
Cable Length |
|
|
|
|
Coding |
Key
|
Key
|
Key
|
Table 3.1 presents some of the basic issues concerning data integration. Let's look at them in detail:
-
Description— This can be the most heinous of all integration issues. How does one determine that the three names presented in the table represent the same client? The transformation process must take each different description and map it to a specific customer name.
-
Encoding— There are four types of scales:4 nominal, ordinal, interval, and ratio. When discussing encoding, we are concerned with a nominal scale. This scale is the simplest of all four scales. A number or letter is assigned as a label for identification or classification of that object. When integrating this data, map the input scales to the data warehouse scale.
-
Units of Measure— Integration of units of measure can be deceptive. While it may seem at first that it would be a simple mathematical calculation, issues such as precision must be considered when making these conversions.
-
Format— The originating operational systems may store data in a variety of formats. The same data element may be stored as character in one system and numeric in the next. As with the integration of all data elements, consider the ultimate use of the data within the warehouse.
One final note on the transformation process: Do not underestimate the task of defining an enterprise data format. It is necessary to get consensus on any format. The unfortunate truth is that when more than one person is involved in a decision, there are politics involved. Surprisingly, data elements can become highly controversial and political topics. Forewarned is forearmed. When defining data elements, expect political battles.
3.2.1.3 Nonvolatile
A major difference between the data warehouse and a transaction-oriented operational system is volatility. In the operational environment, data is volatile—it changes. In the data warehouse, however, once the data is written, it remains unchanged as long as it is in the warehouse. Figure 3.5 demonstrates the difference between the two system types as it relates to volatility. We begin on Monday. The quantity on hand for product AXY is 400 units. This is recorded in the inventory system in record XXX. During the Monday extraction, we store the data in the warehouse in record ZZZ. Tuesday's transactions reduce the quantity on hand to 200. These updates are carried out against the same record XXX in the inventory system. Tuesday night, during the extraction process, the new quantity is extracted and recorded in a completely separate data warehouse record YYY. The previous ZZZ record is not modified.
Figure 3.5. Data volatility.
In essence, the nonvolatility of the data warehouse creates a virtual read-only database system. No database can literally be a read-only. Somehow, at some time, data must be stored in the database. The data warehouse does this in bulk. An extraction adds new records to the database; detail records already in the database are not modified. One of the challenges of a transaction processing system is that multiple users attempt to read and write to the same records, causing the database to lock records. This is not a concern in the data warehouse, since users only read the data.
The database engine itself benefits from the nonvolatile nature of the data warehouse. While it is still critical that appropriate backup procedures be in place for the central repository, the database can eliminate many background processes used for recovery. For example, databases generally keep a redo log. These logs allow database administrators to return the database to its proper state after instance failure. Since updates are not being made against the data warehouse, there is no need to run this process.
3.2.1.4 Time-Variant Collection of Data
The nonvolatility of the data within the warehouse adds another dimension to the data warehouse, the dimension of time. If one were able to extract all the data from the operational systems in one specific moment in time, it would create a snapshot of the state of the organization. The warehouse, in essence, does this. At specified intervals, the warehouse takes a snapshot of the operational environment. The snapshots stored in the warehouse become frames on a roll of film, and this renders a movie. Time is not a variable for a snapshot; it is static. In a movie, however, time becomes a variable. The film can be run at whatever direction or speed the viewer may wish.
The data warehouse is like a movie. The decision maker can view the data across the field of time at whichever level of detail he or she may wish. This allows the business analyst to view patterns and trends over time. Time has become one variable that the analysis can manipulate. In short, the data warehouse is time-variant.
3.2.1.5 Supporting Management's Decision
The first section of this chapter discussed the strategic mission of the data warehouse. With this understanding, we see yet another difference between the data warehouse and the operational environment. The typical operational system is some automation of a manual process. The user community, therefore, is typically involved in the lines of production. As we said earlier, the data warehouse user is the decision maker. The strategist is any individual within an organization responsible for the strategy of any part of the organization. This includes product managers, marketing managers, department managers, and even CEOs.
It is very important that the actual decision maker interacts with the data. Management can no longer be satisfied acting as the passive recipient of static reports generated by the IT department. The average IT professional will not have the business acumen of the decision maker. When the strategist examines data, he or she will see things in the data that will lead to further inquiries. Some of these keys will be more overt than others; some may even be the result of a sixth sense on the part of the strategist. Static reports will not answer these inquiries, nor can the IT professional be expected to anticipate what questions may be asked. Regardless of the cause, the decision maker must be the main user.
The data warehouse architect must keep this difference in user communities in mind when building the system. It is critical that the system renders the appropriate performance to allow the decision maker to interact with the data in a timely and efficient manner. The user interface must also be so designed to allow the decision maker to explore the data within the warehouse. The challenge for the decision maker should be understanding the data, not retrieving the data from the system.
3.2.2 DISTRIBUTED VERSUS CENTRALIZED WAREHOUSE
One of the implications of the Internet is the ability to distribute the data warehouse. A distributed data warehouse is an integrated set of data stores that are physically distributed across a company's information infrastructure. Figure 3.6 presents the overall architecture of a distributed data warehouse. In this environment, we see independent data marts dispersed throughout the organization. These distributed data warehouses can be composed of homogenous environments in which the same database management system (DBMS) and operating system support the data marts, or they can be heterogeneous environments composed of multiple types of systems. As users query the data from the distributed data warehouse, the data is collated from these systems by either the client-tier or middle-tier application.
Figure 3.6. The distributed data warehouse.
Proponents of the distributed data warehouse contend that distribution of the data solves several problems encountered by one large system. One of the key factors in the success of a data warehouse is timeliness. It is important to get the system up and running quickly, delivering the C-level executive who sponsored the system some proof that his or her support was well founded. To sift through the metadata of the many systems of even a moderately sized enterprise is a huge task that can take many months. In addition, the politics involved in finding consensus among the various departments within a single organization is enough to beach the launching of any data warehouse. This is, of course, a major impediment to delivering a system as quickly as possible.
Putting aside the “people” issues, one can see tremendous technical challenges to the single enterprisewide data warehouse. Even if we achieve consensus among the user community and we succeed in understanding all the metadata within our organization, we are still faced with the challenge of extracting that data and putting it into a single data repository. Once that is completed, imagine the size of such a system! Data warehouses running 20 to 30 terabytes are becoming more common, creating administration and maintenance challenges.
One would think that the decision whether to distribute or centralize the data warehouse is a no-brainer. Isn't it obvious that we should distribute? Well, on paper, even communism works if you are willing to ignore a few painful realities. Distributing the data warehouse is a patently BAD idea. At one time, the statements made above may have been true and the decision to distribute in some environments might not have been all that devastating. In today's world, it is not only possible to create a single enterprisewide data warehouse, it is preferable.
Let's start by thinking back to our metaphor. The information infrastructure is to the organization what the central nervous system is to the organism. The point of a central nervous system is that the intelligence is centralized. There are no organisms currently in existence that have more than one brain. In all organisms, one brain does the thinking and coordinates the activities of the entire organism. All parts of the organism work together for the common good. They do this through the coordination provided by the central nervous system. The organization must do the same—if all parts of the system are to work together in unison, then the intelligence of the organism must be centralized.
An important issue in BI is to define the true state of the organization. How certain values are calculated will vary between groups. In a distributed environment, there is no guarantee that all systems will calculate the data in the same way. Even when there is agreement on the calculations, the source data may vary. The reasons for these differences may be as simple as differing refresh rates. Regardless of the cause, multiple systems may disagree with one another and yet be correct within their own context. C-level executives often meet to discuss the state of the organization, and half the meeting is spent finding agreement in the numbers. In a distributed data warehouse, where two systems may not agree, there is no single version of the truth. In a centralized system, there is a single arbitrator of the truth: the data warehouse. By definition, it contains the authoritative version of the data. We are all drinking Kool-Aid from the same canteen.
By distributing the data warehouse, we were also hoping to avoid the political battles in defining the metadata. By moving the process to the client or middle tier, all we have done is postpone the debate; we haven't eliminated it. We have also removed many of the advantages we would have gained by going to a large centralized data warehouse. In the centralized system, we can pre-aggregate the data for faster retrieval. In a distributed environment, every time we wish to sum the data across systems, we have to extract the data then perform our calculations. The distributed alternative is to pre-aggregate the data in an independent system within the middle tier. When we do this, we are creating the centralized data warehouse. The only difference is in the granularity of the data. All we have done is further complicate a complex problem.
Hardware and software advances have eliminated many of forces that have driven people to attempt the implementation of a distributed data warehouse. For example, data warehouses up to 50 or 60 terabytes are no longer a problem. Advances in parallel database systems and storage arrays resolve the size issue easily. Pre-aggregated data reduces the need for full table scans, and bitmapped indexing enhances performance when such scans are necessary. For a more thorough discussion on how to structure your enterprise data warehouse to support very large databases, refer to my previous book, Object-Oriented Data Warehouse Design.
I am reminded of a scene in a movie where King Arthur is talking to Merlin. Merlin says to Arthur, “My days are drawing to a close. The old gods, the gods of stone, fire, and water will be no more. They will be replaced by the one God.” This is what is happening in the data warehousing world. Market forces are such that the days of the custom-built data warehouse are drawing to a close. As more vendors develop BI extensions to their applications, fewer organizations will opt for a custom system. A BI system built into an already existing application will probably not meet 100 percent of a company's BI needs. It will, however, meet roughly half those needs, and in many cases, somewhat more than half. As long as the BI environment is extensible, a prebuilt solution is only logical. After all, why should a C-level executive commit funds to a risky million-dollar data warehousing project when there is a prebuilt solution that is sure to work?
We will discuss this in more detail later in this chapter. In the interim, consider the repercussions of such a change in the industry. Fewer companies build custom data warehouses. Vendors that specialize in databases and tools specifically for the data warehouse market find their market share shrinking. This results in smaller research and development budgets. The vendors who are gaining market share are the vendors who integrate BI. The R&D budgets expand, giving them greater functionality. Eventually, the specialized tools and databases fall behind the technology curve. They either fold or get purchased by the larger vendor. In the end, the only data warehouses that will be developed are enterprise-based systems that are integrated into the applications.
3.2.3 THE OPERATIONAL DATA STORE
The Operational Data Store (ODS) is a little discussed and often misunderstood system. Although it is mistakenly seen by some as a substitute for a data warehouse, it is actually a complement to it. Whereas the data warehouse is strategic in nature, the ODS is tactical. The data warehouse provides a historical context in which to understand the organization's current environment. With a data warehouse, we attempt to detect past trends in order to predict possible future outcomes of strategic decisions. Not all analyses, however, require a historical perspective. Tactical decisions deal with immediate needs; they are focused on what is happening in the organization now. This is not to totally discount the importance of the past, but in many situations, we need to deal with short-term operational issues. This is the point of the ODS: to provide information for tactical situations in the same way that the data warehouse provides information for long-term strategic decisions.
Since tactical decisions deal with the immediate present, the data supporting these decisions must reflect as nearly as possible the current state of the organization. Due to the large volumes of data involved as well as the processing requirements of integrating data into the data warehouse, it is difficult to refresh a data warehouse on real-time or even near real-time basis. Instead, we use an ODS. The ODS acts as the repository for the real-time information required to support tactical decision making.
How we integrate the ODS with the data warehouse is dependent upon the demands of the individual environment. Figure 3.7 presents the four classes of ODS, each with a varying level of complexity, depending on how closely integrated it is with the data warehouse. A Class 1 ODS is the simplest to construct. It is separate and independent of the data warehouse and contains a straightforward replication of the transactions carried out in the operational environment. The benefit here, of course, is that the data is integrated into a central repository much more easily than with a data warehouse. At the same time, we have all of our information located in one central system.
Figure 3.7. ODS classification.
Consider the implications of a Class 1 ODS. We have discussed how difficult it is to get a complete view of a customer or other objects in our organization's environment when the data for that object is distributed across many systems. As we can see in Figure 3.7, a Class 1 ODS, when properly implemented, eliminates this problem. All the data from the various systems is brought together in one place for analysis. The difficulty with this class of ODS is its lack of integration with the data warehouse. One of the benefits of a central data warehouse is that we now have one source for truth. As you may recall from our discussion of distributed data warehouses, multiple asynchronous systems will often return conflicting results.
Although it is a bit more complex to develop, a Class 2 ODS resolves the data synchronization issue. The Class 2 ODS acts as a staging area for data that is to be integrated into the data warehouse. Just as in the Class 1 ODS, transactions are replicated from the operational environment. The data warehouse in a Class 2 ODS, however, extracts its data from the ODS and not from the operational environment, as does a Class 1. This is a rather nice solution. Its replaces the data warehouse ETL process in the operational environment with a simple transaction replication. The load of the ETL process is shifted to the ODS. A Class 2 ODS can be seen as straddling the operational transaction-oriented environment and the strategic BI environment. What is especially appealing about this solution is that it provides a tighter integration between the data warehouse and the ODS. The two systems work together, each sharing data, each supporting different types of decisions.
The Class 3 ODS reverses the integration of two systems. Rather than integrating the data from the ODS to the warehouse, the warehouse data is used to update the ODS. As we can see in the figure, it is very similar in where it sits within the information infrastructure to a dependent data mart. Note that the Class 3 ODS data is not as current as the data in the Class 2 ODS, since it undergoes the additional processing of passing through the data warehouse. The Class 3 ODS is useful in environments in which we wish to distribute subsets of the data within the warehouse to specific communities. Finally, the Class 4 ODS establishes a two-way dialog with the data warehouse. Just as with the Class 2 ODS, the transactions in the Class 4 ODS are integrated into the data warehouse. The data warehouse in turn provides the ODS with aggregated data and analysis data.
Personally, I am cynical of the long-term benefits of the ODS as a separate and distinct structure. Advances in technology are continually improving ETL performance. Depending on the data refresh rates and the resource requirements of the ETL process, in many instances all decision support questions may be handled by the data warehouse itself. In addition, an enterprise class database is able to pre-aggregate data and provide tactical analyses directly on the operational systems. More and more analyses that cannot be supported by the data warehouse may be addressed by these types of solutions. It seems, in light of these technological advancements, to be increasingly more difficult to justify the construction of an ODS. I wonder if there is really a necessity to add yet another system in the pantheon of systems that comprise that average information infrastructure.
3.2.4 DECISION SUPPORT SYSTEMS (DSS)
One could loosely define DSS as the presentation layer of the data warehouse. It is important, however, to emphasize the world loosely. When we discuss DSS, we are looking at more than just the presentation of the data within the warehouse. As shown in Figure 3.1, DSS extends from the extraction of the data through the warehouse to the presentation of that data to the decision maker. To classify these tools as mere presentation vehicles would greatly underestimate their value. These systems come in a variety of flavors, each meeting different needs within the organization.
In defining the role of the decision maker, we noted that he or she exists at all levels of the organization, from the department manager all the way up to the CEO. Each has his or her unique information requirements. The higher one moves up in the organizational structure, the higher the level of detail he or she uses on a daily basis. There are different categories of DSS tools to meet the requirements of these different levels. We have presented the spectrum of DSS tools in Figure 3.8.
Figure 3.8. DSS spectrum.
At the most rudimentary level, one could consider reporting a certain level of DSS. In comparison to the other levels of DSS, reporting is a passive consumption of data. This of course does not relegate it to the mere production of static green-bar reports. Today reporting systems are much more than that, as we shall discuss in the following subsection.
The next level of DSS is analytical; this is filled by OLAP tools. Where reporting simply presents the data, analytical tools take the user further. OLAP tools allow the decision maker to interact with the data. Data mining extends this interaction with the data to a level of discovery. This is where new behaviors within the data are unearthed and explored. We will discuss each category of tools in the subsections to follow.
3.2.4.1 Reporting
When discussing BI, reporting is often not recognized as a DSS. Reporting, however, fills an important role. As we said earlier, the higher the level of the decision maker in the organization, the higher the level of summarization required. Today most higher levels of management are not interested in interacting with data. At the highest levels within organizations, most strategists are interested in simple dashboards: systems that will display leading key indicators of an organization's health. While it is true that as younger, more technically astute management rises to the level of corporate leadership, this trend will change; most corporate leaders are more than happy to receive simple reports. While they may see them as simple reports, today there is nothing simple about reporting.
If an organization's initial foray into DSS is reports-based, the data warehouse architect must think beyond simple static green-bar reports to the overall enterprise. Enterprise class reporting in today's world should simplify the creation, maintenance, and distribution of reports. The enterprise reporting tool should make it as easy as possible to get the data to whomever may require it. For this reason, an enterprise reporting tool should have the following capabilities:
-
Rapid Development— The tool should provide a wizard to walk the developer through the creation of the report. It should also allow the developer to view the report exactly as it will be seen.
-
Easy Maintenance— The tool should allow the user to modify reports through a wizard.
-
Easy Distribution— The report engine should be able to direct the same report or portion of a report to different media. For example, a portion of a report could be posted to a Web site, while another section is sent to management via email, and a third is sent via standard mail to stockholders.
-
Internet-Enabled— The reports server should be able to receive requests from both Web-based and non-Web-based clients.
3.2.4.2 Online Analytical Processing
OLAP takes the decision maker to new levels in data analysis. With OLAP, the decision maker's analysis interacts with the data contained within the system. It leverages the time-variant characteristics of the data warehouse to allow the strategist to look back in time as well as into the future. In looking back, the strategist can identify trends that may be hidden in the data. In looking forward, these trends can be used to forecast future conditions. In addition, the characteristics of these trends can also be examined. The strategist can anticipate how possible changes in these trends will affect the organization's environment.
A variety of OLAP tools have been developed to achieve the objectives described above. Some of these tools are based on a multidimensional database, specially constructed and tuned for analytical processing. These tools are sometimes referred to as MOLAP, multidimensional On-Line Analytical Processing (OLAP). There are, however, shortcomings to multidimensional systems. To overcome these obstacles, many OLAP tools have taken a ROLAP approach. ROLAP is Relational On-Line Analytical Processing in which the multidimensional view of the data is implemented on top of some relational engine. There are also hybrid OLAP engines that are a combination of the two. These are frequently referred to as—you guessed it—HOLAP.
A complete discussion of OLAP could fill a book by itself, and that is not our objective here, but there are some points that will provide us with a general understanding of OLAP. Figure 3.9 presents a typical OLAP interface. Although it might appear to be a standard spreadsheet with rows and columns, an OLAP tool is much more powerful. OLAP allows the user to present the data in multiple dimensions. In our figure, we are presenting the sales data for automobile dealerships; the dimensions are time, product, and dealership. As we can see, time is spread across the columns, and the rows represent each product. Each page represents a different dealership. At an absolute minimum, OLAP must be able to present data in multiple dimensions at one time. Although our simple example shows three dimensions, the OLAP tool should be able to be extend the presentation of the data to many more dimensions.
Figure 3.9. OLAP interface.
The OLAP tool should also allow for the rotation of data. A rotation changes the orientation of the display. The dimension that runs across the columns is exchanged with the row data, for example. In Figure 3.9, a rotation would distribute the products across the columns, and each row would represent a month. The key is that with OLAP this can be done easily through an intuitive interface.
Another important feature of OLAP is the ability to drill down and roll up data. This allows the user to look at summaries of data. In our example, we have shown data at the dealership level. Each cell in the matrix is the sales of that particular product for that particular dealership. Rollup allows the strategist to sum the data of the different dealerships into a single regional sales number for each product. Drill-down is the same operation in reverse. The drill-down operation allows the strategist to look at the detail records of a summary. If we were to present regional sales numbers, a drill-down would allow the user to look at the sales numbers of the individual dealerships in those regions.
The point of OLAP is to give the decision maker the tools necessary to detect trends and analyze the characteristics of those trends. This includes the ability to perform what-if analysis. The OLAP tool must allow the strategist to build models based on the data and manipulate the variables in the model so that the strategist can examine both the effects of particular trends and how the changes in those trends will influence the business environment.
3.2.4.3 Data Mining
Not that long ago, hidden pictures were the latest rage. These were pictures of seemingly random patterns and colors that, when looked just right, revealed a hidden picture. The trick was to look at the entire picture at once, and after several hours (and a headache), you were able to see this hidden picture. I never could see the hidden pictures. You see, I am color blind. Really. No kidding. So I could stare at those patterns all day and never see anything but a bunch of squiggles. Data mining, in a way, is similar to those pictures. The data as a whole seems to be nothing but a collection of random events. Data mining allows us to see the picture hidden within those events—color blind or not.
There are two basic types of data mining: classification and estimation. With classification, objects are segmented into different classes. In a marketing data warehouse, for example, we could look at our customers and prospects and categorize them into desirable and undesirable customers based on certain demographic parameters. The second type of data mining, estimation, attempts to predict or estimate some numerical value based on a subject's characteristics. Perhaps the decision maker is interested in more than just desirable and undesirable customers. The strategist may be interested in predicting the potential revenue stream from prospects based on the customer demographics. Such a prediction might be that certain types of prospects and customers can be expected to spend x percentage of their income on a particular product. It is common to use both classification and estimation in conjunction with one another. Perhaps the strategist would perform some classification of customers and then perform estimations for each of the different categories.
Whether performing a classification or an estimation, the process of data mining is basically the same. We begin with the data, or more appropriately, a subset of the data. This is our test data. The size of the data set is dependent on the deviation of characteristics of the data. In other words, if there are relatively few variables whose values do not greatly deviate from one another, then we can test on a small number of records. If the data has many variables with many possible values, then the test data is much larger. As with the data warehouse, the data is cleansed and merged into one database. If we are working directly from a data warehouse, we would expect this process to have already been carried out. This does not mean that we assume the data is cleansed and transformed. The data quality must still be verified to ensure accurate results.
We then define the questions that are to be posed of the data. Despite the common misconception of data mining, the strategist must be able to define some goal for the mining process. Perhaps we would like to segment our market by customer demographics or we would like to know the market potential of certain economic groups. In either case, we need to specify what we want to discover.
Using the test data, we construct a model that defines the associations in which we are interested. We have known results in the test data set. We know that certain records in the data set represent desirable or undesirable customers, or we know the market potential of a set of clients. The model will look for similarities in the data for those objects with similar results. Once we have built the model, we train it against subsequent test data sets. When we are confident in the model, we train it against the actual data we wish to mine. At times the model will not include some records that should be included, or it will include records that it should not. In either case, there will be some level of inaccuracy in the data model. No model can predict with perfect accuracy, so we should expect some margin of error. The models come in a variety of types.
Decision Trees
Decision trees are a common modeling technique that has been used since before data mining was developed. In Figure 3.10, we see a typical decision tree. The decision is boxed, and each alternative is circled. The branches extending from each circle are labeled and assigned a probability. The company must decide whether to carry out a marketing campaign or invest the money where there would be a 10 percent return on the investment.
Figure 3.10. Decision tree.
Between the two alternatives in the tree, there are three possible outcomes. In executing the campaign, there is a 50 percent chance that sales will increase by $1,000,000. There is also a 40 percent chance that sales will be constant, and a 10 percent chance that sales will decrease by $100,000. We multiply the probability of each outcome by the value of that outcome. We then deduct from the sum of these outcomes the cost of the campaign. We do the same with the decision to invest the money. In calculating the value of the decision to invest, we add the benefit derived from the investment to the sum of the possible outcomes. We see from the tree that the decision to increase the campaign has the value of $390,000, while the decision to invest the money has a value of $15,000. While the example shows two alternatives with three possible outcomes, decision trees can have many alternatives, outcomes, and subsequent decisions.
Neural Networks
A neural network is an interesting approach to data mining. The structure of the model mimics the structure of the human brain. The brain is composed of neurons, each of which could be thought of as a separate processor. The inputs to the neuron are scattered over its dendritic tree. Axons are the white matter in the brain that insulate the neurons. The axon is, in a sense, an output device for the neuron; the dendrite is the input. The axon passes the output of one neuron to the dendrite of the next. Each neuron processes the information it receives and passes its results on down the line. How this results in human thought is the subject of another book.
The neural network model attempts to perform this same kind of process. In the neural model, there are a number of nodes; these could be processors in a massively parallel processing system or simply processes in a multiprocessing system, as demonstrated in Figure 3.11. The network receives as input the location, age, gender, and income of the prospect. These are taken in by the neurons and, based on some algorithm, generate an output. These outputs are then added in a weighted sum to determine some final result, such as likely to buy or not to buy a particular product.
Figure 3.11. Neural network.
Genetic Modeling
Genetic modeling is well suited for categorizing. It comes from the concept of survival of the fittest—in this case, survival of the fittest model. We begin by randomly placing the data into the desired categories. The model then evaluates each member of each category based on some function that determines the fitness of the member. Members that are not well suited to the class are moved to other categories. The class will continue to alter itself. Even as the class receives new data, it continues to alter itself to arrive at the best fit.
3.2.5 BI APPLICATIONS
So far, we have discussed BI tools. There is a variety of tools in the software industry: compilers, code developers, documentation generators, and even testing environments. What they have in common is that they are used to build something in software. They are to the programmer what a hammer or saw is to the carpenter. They are not the product, what is actually turned over to the end user. Reporting, OLAP, and data mining are tools. Some software engineer uses them to build something. For many years, these were the height of BI. As the BI industry matures, it is inevitable that we should proceed to the next level, to BI applications.
It is interesting to watch applications as they come to maturity. Whether it be procurement, payroll, human resources, or BI, each has followed the same basic path. This path consists of four distinct stages. Figure 3.12 presents these stages and their relationships to market penetration. In the first stage, the “custom” stage, of development, every engagement is homegrown, built for that particular environment. At one time, you couldn't go out and buy a database system. Many of my early programming jobs involved building a database so that a custom application could sit on top of it. This is how BI began. When we first started developing BI applications, all we had was an operating system, a compiler, and a heck of a lot of courage. If we were really lucky, we had a database, but this wasn't always the case.
Figure 3.12. The stages of software maturity.
The second stage of maturity is the component stage. At this point, we don't quite have an application. Primarily what we saw in this era were packages of specialized tools that bolted together to form an application. In the BI world, these were the days when a number of independent vendors offered ETL tools, OLAPengines, and data mining suites. People took a best-of-breed approach to solutions. They would cobble together the fender of one auto with the hood and tail fins of another. After they threw in an engine they picked up somewhere, they called it a car. Thankfully, BI has just completed this stage of maturity.
The next stage is the shrink-wrapped application. This stage brought us from offering tools to offering actual applications. The BI market is now in this stage of development. Many Enterprise Resource Planning (ERP) vendors offer prebuilt data warehouses, complete with ETL processes and warehouse data structures. Sitting on top of these warehouses are the reports, OLAP data cubes, and data mining applications. These shrink-wrapped solutions offer enormous benefits to the end user. While there may be some tuning of the BI application required to meet specific needs, the bulk of the development costs are eliminated.
We should also clarify the use of the term tuning. There is a huge difference between tuning an application and customization. Customization is the changing of the software itself, replacing one procedure with another procedure more tailored to a specific environment. Tuning is the process of taking what is already there and extending its use. By tuning we mean creating additional reports, adding elements to the data warehouse from external sources, or integrating the output into an executive dashboard. It is the equivalent to setting up a chart of accounts to a financial system.
When we think BI applications, we should not limit our horizons to a new way of applying OLAP or the integration of data mining into existing applications. Although these methods are valuable, there are new areas of analysis to explore. In the following subsections, we will examine two such applications: the balanced scorecard and activity-based costing. These are applications that fulfill our definition of BI, yet have their origins in the business community.
These stages of software maturity correlate well to the technology adoption life cycle described by Geoffrey Moore in Crossing the Chasm.5 This life cycle consists of five stages: innovators, early adopters, early majority, late majority, and laggards. The innovators are the same people who bought the first cell phones back when they were still attached to a dictionary size battery. Any new technology turns them on. This group bought into BI in the custom stage of the application maturity cycle. The second group, the early adopters, are the folks who are willing to take a calculated risk if they see the payoff. This group bought into BI during the component stage.
The third group in Moore's life cycle is the early majority. To succeed in the software industry, a technology must succeed with the early majority. To succeed with any group of customers, you must understand the needs and desires that drive them. The early majority group is risk-averse. They want to see a proven record of accomplishment before they make a commitment. The risk involved in building a data warehouse in the component days was too great to attract this group to BI. In the component stage, most data warehouse implementations failed. The shrink-wrapped stage, however, is exactly what they want to see. A BI vendor can say to these users, “Here is the application. Touch it, feel it, understand how it will fit within your specific environment.” The users get those all important warm and fuzzies. They aren't buying some nebulous concept. They are buying something that can actually be demonstrated. A smart vendor will tell them, “You can get this specific return on your investment. I know this is a fact because here is a list of your top competitors who are getting that return. This solution has been proven in your own industry.”
In light of such a compelling case, more and more of the BI tools vendors, still in the component stage, will be forced from the market. They cannot meet the needs of the early majority. The early majority begins to look on BI as just another part of the application as more ERP vendors integrate it into their applications. In addition to a significant cost reduction, integration eliminates the biggest obstacle to penetrating the early majority: risk. This leads us to our fourth stage, integration. As integration tightens, specialty tools exit the market and the applications simply incorporate the new capabilities. BI is well on its way to this level of integration. Just as the Internet will become just another way of doing business, BI will become just another aspect of the application. Eventually, BI will be, as they say, table stakes. You can't play if you don't have it.
We will examine the next stage of the BI loop by reviewing two BI applications: balanced scorecard and activity-based costing. It is interesting that in the minds of many, these are not typically thought of as BI. Their inception actually lies outside the IT community, which perhaps is one of the reasons that they are so powerful. These concepts started with business people, people who had business problems to solve. They do fit nicely within the BI loop presented earlier. The applications extract data from the operational environment. The data is then cleansed and transformed to be presented to the decision maker for analysis. Based on this analysis, the decision maker then forms some course of action. Again we see the three A's: Acquire, Analyze, Act.
3.2.5.1 The Balanced Scorecard
Recent surveys have shown that most organizations do not have a written, well-defined strategy. Of those that do, a large percentage have no way of communicating that strategy to those who execute it or a means to measure how well the organization is performing towards achieving those objectives. All too often, organizations rely on financial reporting to be the vehicle by which organizations evaluate their performance. Unfortunately, financial reporting makes a poor tool for strategic analysis. Typically, financial measures are lagging indicators. They tell us how we did, but not necessarily how we are doing. By the time our financials reflect a problem, it may already be too late to take corrective action.
Balanced scorecard is an application first described by Kaplan and Norton in their book The Balanced Scorecard.6 The scorecard views the health and well-being of the organization by more than its financial data. It views the organization according to the strategy established by the C-level executives and translates the vision established by the C-level executives to quantifiable measures. Figure 3.13 presents the structure of this translation process.
Figure 3.13. The Balanced Scorecard pyramid.
We begin at the top of the pyramid with the company vision: Where do the C-level executives see the organization in the next five years? What vision do they have for the organization? Where will they lead us? We are defining our journey's destination. Perhaps the vision is to be the recognized leader in our industry. In the automotive industry, we might see our company designing the next Rolls Royce, or in the writing instrument industry, crafting the next Mont Blanc. An alternate strategy would be to develop greater market penetration: We might envision our company producing a car that outsells Toyota or a pen that outsells BIC.
The next step is to define the strategy to achieve this vision. How are we going to realize the vision we have articulated? If we plan to be the industry leader, we would decide on a low-volume, high-quality strategy. In this case, we aren't as concerned with market penetration as we are with leading the market in product quality. We would focus on developing high-quality products and charging a premium for them. If our strategy is to penetrate the market, we would focus on high-volume, high-value. We would sell products that would deliver value at a low cost.
Quite often, the completion of these two steps alone will justify the balanced scorecard project. It is common for C-level executives to have some vague idea of where they want to the organization to be in several years, yet to lack a clear vision—as some would call it, “that vision thing.” Others may have a vision, but no clear set of strategic objectives to achieve that vision. In creating a balanced scorecard, management is challenged to articulate both the vision and the strategy.
As we look at the strategy we will see certain themes emerge. These are the “doable” pieces of the strategy, the functions to be performed in order to enact the strategy. The market leader's low-volume, high-quality strategy would translate into producing high-quality products and developing high levels of customer satisfaction. We can see two themes here: production quality and customer satisfaction. We then take the next step and map these themes into the different perspectives by which we view the organization.
From these perspectives, we view how we are going to act on the strategy described in our strategic themes. A traditional Kaplan and Norton balanced scorecard views the organization's health from four different perspectives:
-
Learning and Growth— This perspective focuses on how well members of the organizations are tooled to deliver on the strategy. Do they have adequate training and appropriate skills? Are they empowered to perform the assigned tasks?
-
Internal/Business Processes— This perspective focuses on how well the internal processes of the organization can meet expectations. It evaluates the processes critical to attracting and retaining customers.
-
Customer— This perspective focuses on how well we are satisfying our customers. It identifies our target markets and evaluates how successful we are in each.
-
Financial— Finally, we evaluate how well the strategy of the organization is contributing to the overall financial well-being of the company.
The next step in the process is to define a set of Key Performance Indicators (KPIs). A KPI is a measure of the overall performance of the organization within a particular perspective. The novice scorecard designer is tempted to include a large number of measures. This actually defeats the purpose of the scorecard. A balanced scorecard is meant to provide a clear, concise view of the strategic position of the organization. Including measures that are not necessary or not indicative of the performance within a particular area will only cloud the issue. As a general rule of thumb, we should use three to four KPIs for each perspective.
If we plan to be the market leader, we would measure how well we are manufacturing high-quality items and satisfying our customers. In the internal processing perspective, we might define KPIs around our production process: In random quality testing, how many items were rejected? How many products required after-sale servicing? We might look at the customer perspective KPIs that indicate customer dissatisfaction. How many customer complaints were received in a given time period? How many products were returned by customers?
By examining each indicator, we can define where our performance is and isn't advancing our strategy. We might note that the number of rejects has risen sharply in the past month. This of course leads us to believe there is something wrong with our production process. After examining this process to determine the reason for the high rate of rejection, we discovered that the raw materials are of a lower quality. Perhaps we have switched suppliers in an attempt to reduce production costs. Perhaps we installed new machines without supplying adequate operator training.
As we look at the different perspectives, we can see that they are linked in a cause and effect relationship. We show this relationship in Figure 3.14. In order to achieve our financial objectives, we must satisfy our customers. In order to satisfy those customers, we must have the internal processes that will fulfill their needs and desires. In order to develop those processes, we must develop the skills of our internal staff to support those processes.
Figure 3.14. Relationships between scorecard perspectives.
In developing a balanced scorecard, we define the relationship between each KPI in each perspective. This is often referred to as a strategy-map view of the scorecard. Figure 3.15 presents a typical strategy map. We start with a simple enough objective: increase profitability. To do this, we must increase sales and reduce expenses. All of these objectives are within the financial perspective. Let's follow the path of increasing sales. We determine that the best way to increase sales is to increase the customer's lifetime value; this is an objective within the customer perspective. We then ask ourselves, What is the best way to achieve this objective? First, we must improve service so that current customers, even when they have a problem, will want to deal with our company. Second, we must become a market-driven company, allowing the needs of the market to drive product design. Third, we must develop a better understanding of our customers so that we know their needs and desires.
Figure 3.15. A balanced scorecard strategy map.
To fulfill the objectives in the customer perspective, we must achieve objectives in the internal processes perspective. We must develop a CRM system so that we can better understand our customers and a BI system so that we can understand the market in which we compete. As we move back through the strategy map, we see how one objective is a cause of another. The results of one objective affect the results of the next objective.
The strategy map clearly shows that financial measures are lagging indicators; they occur at the end of process. If there is a problem, the cause happened months ago and the effect is just now being felt. The strategy map becomes the decision maker's early warning system. Our strategy map tells us that we need to increase customer lifetime value, but we see that the IT department is failing to implement the CRM system. We can see this before the delay has a detrimental effect on achieving our strategic objective, and we can correct the situation.
Figure 3.16 presents the balanced scorecard data flow. Data is extracted from the operational environment, where it is used to calculate KPIs. These indicators are then grouped into the different perspectives. The decision maker reviews these perspectives to determine the state of the organization, taking corrective action when necessary. Compare this data flow to the one presented in Figure 3.1; they are essentially the same. The balanced scorecard is the embodiment of BI.
Figure 3.16. Balanced scorecard data flow.
This section introduced the concept of balanced scorecards. There are a variety of methodologies for scorecard development as well as a number of different scorecard structures. For example, the perspectives presented in this section are the traditional perspectives seen in many scorecard implementations. This should by no means imply that these are the only perspectives that are permissible. The beauty of the scorecard is that it is a very flexible, commonsense approach to understanding corporate strategy. To truly understand balanced scorecards, read Kaplan and Norton's The Balanced Scorecard.
3.2.5.2 Activity -Based Costing
Activity-based costing is another BI application that addresses the deficiencies of traditional financial reporting. There are two audiences for financial data. The first group includes financial analysts, creditors, investors, and other company stakeholders. They use these traditional financial reports to understand the value of an organization. They depend on this data to determine if the company is worth an investment or solid enough to be extended credit. For this reason, the ways in which this data is reported is tightly regulated by a number of different agencies.
The second group interested in financial data is those managers and employees involved in the production and sale of products. As do any stakeholders in the organization, they of course want to understand the overall value of the company. This group has an additional need, however, a need for detailed cost information. This cost data is necessary for the efficient operation of the organization, to understand the cost of products and services. With this data, management has a way to understand how well the processes they use to create products and services are contributing to the financial well-being of the organization.
Let's think about what this means to an organization. Suppose we are a company, Billy Boy Bowling Balls. We've been making bowling balls for years. When we first began making bowling balls, we were fabulously profitable, but over the years, things have changed. Bowling ball styles have changed. The way we manufacture and distribute the balls has changed. One of our nephews just graduated from college with a BS in computer science and is trying to figure out how to distribute the balls over the Internet directly to the consumer. With everything that has changed, we don't really know which brands of bowling balls are profitable and which aren't.
One of the reasons we are unsure is based in the traditional accounting procedures. Direct costing systems ignore overhead costs. The rationale is that this cost data is fixed and is a small fraction of the overall costs of production. This assumption, however, is incorrect. There isn't any such thing as a fixed cost—just costs that may change over a longer period. Often, this isn't even the case. We sometimes see fixed costs increasing at an even faster rate than variable costs. Also, fixed costs can actually be many times greater than direct variable costs.
When our company first started, bowling balls were all pretty much one color, black. While there were a variety of sizes, the processing was the same. As time went on, we discovered that our competition was doing different things with balls. Some were making balls in a variety of colors and designs. One competitor even introduced a line of clear balls with objects embedded inside, such as a man positioned to look as though he were turning somersaults as the ball rolled down the lane. In an effort to compete, we did market research on some of the brands. The manufacturing of these balls differed as well; some were very inexpensive to make, and others entailed a complex and expensive manufacturing process.
In a traditional accounting system, these additional costs would be lost. Some brands underwent extensive market research before production; others did not. If both brands sold for the same price, the brand that was researched is obviously less profitable. After research, we decided to sell our own line of clear balls. Due to the special characteristics of the ball, shipping costs for these balls were much higher than for a standard ball. In traditional accounting, these differences would also be lost.
Figure 3.17 demonstrates the problem graphically. Traditional accounting methods aggregate cost according to the structure of the organization. Traditional accounting sees the company as an organization chart. Cost is divided into neat little silos, each cost representing the cost behind each little box in the organization chart. The problem is that products aren't manufactured vertically, in neat little organizational silos. Production occurs across the structure, horizontally. Traditional accounting doesn't see a process; it sees an organizational structure.
Figure 3.17. Organization cost versus activity cost.
Activity-based costing looks at the cost horizontally. Imagine a factory in which raw materials go in one end and finished goods come out the other. If we were able to peel back the roof of this factory, we would be able to examine the steps in the transformation of the raw materials to finished goods. We can imagine the transformation process as a series of discrete steps. In our bowling ball example, the raw materials are prepared, then the balls are formed. The balls are then polished and passed on to the shipping department. Shipping packages the balls and ships them to our distributors. Each step in our manufacturing process is an activity. Every time we do something as part of our manufacturing process, we consume resources. Some of these resources include direct materials, such as the cost of the material to make the balls or the cleaning solution used to polish the balls. Other resources include labor and machines to form the balls and drill the holes.
We now have two sets of data. One set defines the steps in the manufacturing process. We call each step an activity. We also know the materials consumed by each activity. We can therefore calculate the cost of resources consumed by an activity each time that activity is performed. The sum of all the activity costs in the manufacturing of an individual product is the cost of making one unit of that product. The cost object is the product or service that we produce. The cost of making one unit of that cost object is the cost object unit cost.
To understand how we calculate the cost object unit cost, refer to Figure 3.18. We begin with two departments, production and packaging. The accounts associated with these departments are standard general ledger accounts. Both departments have an account to which labor is charged. The production department also has a machine account used to charge the cost of the machine that forms the bowling balls. The packaging department has a materials account used to charge the cost of shipping materials. The production department performs two tasks: forming and polishing the bowling balls. The packaging department is responsible for packing the bowling balls. Every month, the production department spends $100,000 on labor. The machine costs are $37,000. The packaging department spends $50,000 on labor and $26,000 on materials.
Figure 3.18. Calculating cost object unit cost.
Half of the labor in the production department goes into making the balls, and the other half goes into polishing them after production. The one function of the forming machine is the production of balls. The only activity carried out by the packaging department is the packing of bowling balls. We would distribute, therefore, the costs recorded in each account proportionally to the activities carried out by each department. As shown in the figure, 50 percent of production labor goes to the forming activity and 50 percent of the labor goes to the polishing activity. In a similar fashion, the entire cost of the forming machine is assigned to the forming process. Since the packaging department only packs the bowling balls, 100 percent of that department's labor and materials are assigned to the packing activity. In a given month, we can see that it costs us $87,000 to form the bowling balls. This is known as the total activity cost—the total an activity costs our department in a given period. We can see that polishing has a total activity cost of $76,000, and packaging has a total activity cost of $50,000.
Next, we determine how frequently we carry out each activity. This is the activity driver volume. In this example, the production department polishes bowling balls 20,000 times in a month, the same time period used for calculating our total activity costs. The activity driver volume for the activity driver is 20,000. The activity driver volumes for the packaging and forming activities are the same, 10,000. If we divide the total activity cost by the activity driver volume, we will know how much it costs us to carry out that activity just once. This is known as the activity rate. In the example, the total activity cost to form a bowling ball is $87,000, and we do this 10,000 times. It is safe to conclude that the activity rate is $8.70. It costs our company $8.70 to form one bowling ball. Similarly, polishing has an activity rate of $2.50, and packaging has an activity rate of $7.60.
We now understand what it costs us to do what we do every time we do it. The next question is, How often do we have to do it? This is known as our consumption quantity—how often we perform a specific activity to produce one unit of product, one cost object. In this case, we want to know how often we form, polish, and package during the production of one bowling ball. The consumption quantity for forming and packaging is one; we do this once for every bowling ball we produce. Due to the nature of the high-quality materials used in our bowling balls and the desire to have the prettiest bowling balls around, we polish our bowling balls twice. The consumption quantity for polishing is two. We then multiply the consumption quantity by the activity rate. This is a bill item, a charge that is added to the cost of the cost object.
The final step in the process is to sum the cost of the bill items, the cost of direct materials, and the cost of other cost objects to get the total cost object unit cost. Notice that we have added the unit cost of another cost object into the unit cost of our bowling balls. In this example, we may decide to include in every ball that we ship a premium, top-of-the-line bowling glove. In addition to including the glove with our ball, we also sell it as a separate item. Since it is sold independently, we have decided to make it a totally independent cost object whose cost is included in the cost of our bowling ball.
Keep in mind that what we have shown here is a simple example. There are many issues we did not discuss, such as yield and material cost calculations. In addition, we did this for only one product line. When we first described our bowling ball example, we said that we come out with different lines of bowling balls, each of which are processed differently. We would run these calculations for each unique production process. We also noted that prior to going into production, we spent time researching the market. These costs would also be included in the calculations. For a more detailed discussion of Activity Based Costing, refer to the masters, Kaplan and Cooper, and their book Cost and Effect.7
We should be aware of the enormous power granted to us by understanding the relationship between the activities to produce a product and the unit cost of that product. There are two outgrowths of Activity Based Costing. The first is Activity Based Budgeting (ABB). All companies go through a budgeting process. With ABB, we follow the path described earlier, but in reverse. The sales department has forecasted a demand for 20,000 bowling balls next month. Since our value chain is fully integrated over the Internet, we practice just-in-time manufacturing, which means we have no bowling balls in inventory. We then budget our production department's labor cost to be $100,000. Packaging will need to budget $52,000 for materials and $100,000 for labor. There is always the issue of capacity, but in this example, we will assume that we have the appropriate capacity to meet the demands.
We can also use activity-based costing to better manage our organization through Activity-Based Management (ABM). ABM is both operational and strategic. It is strategic in that it helps us do the right things. It is operational in the sense that it helps us determine how to do those things in the correct way. With this understanding of the production process, we can evaluate each stage and redesign it to reduce costs. Perhaps we discover that it is unnecessary to polish our bowling balls twice and that the incremental cost is not worth the demand.
Typically, companies determine price backwards. They first determine the cost of a product and then determine a price based on the return they desire on that investment. To determine the price of a successful product, we have to determine the customer's acceptable price range, not the company's. We then decide what margin we want. This leaves us with a maximum cost for producing the product. We can then analyze the proposed production process to determine if we can produce it for that cost. Since we are using activity-based costing, our numbers are more accurate and we can do a better job of predicting cost. Also, if the cost constraints can't be met, we can redesign the process.
As we can see, activity-based costing fits squarely within the definition of BI. Our source data comes from the operational environment, typically from our financial system's general ledger. We take the data, store it, perform operations upon it, and deliver it to the decision maker. He or she then uses this information to define some course of action.
3.2.6 DECISION MAKERS
The decision maker is key to the BI loop. The decision maker takes information from the DSS and defines some course of action. Typically, we think of the decision maker as a member of our own organization, an individual who is working within the company, planning some strategy for the organization itself. At one time, this might have been true. In the Internet age, however, this broad definition of a decision maker actually includes individuals outside the organization as well.
The first group of decision makers we should consider is our partners. We have gone into great detail discussing how important it is for organizations to use the Internet to integrate the entire value chain. We have also noted that this integration means an expansion of the BI system's scope. As we expand the scope of the BI system, we also expand the user community. We begin to see how sharing strategic information with our partners benefits our own organization. Perhaps we can integrate our CRM system with our partners' systems. As our suppliers have better insights into our customer base, they can team with our organization more effectively. We can also share KPIs out of our balanced scorecard. Suppliers can see how the materials they supply our company affect production. They can use that information to improve the quality of their materials, thus improving the quality of our products. This leads to greater market share and greater profitability for both companies.
The second group of decision makers is the customers themselves. This may seem strange at first, but consider the role played by the decision maker in the BI loop and it will become clear. The decision maker receives information from the DSS and makes some decision based on the information provided. Isn't this the case every time a user logs onto a Web site and is given purchasing recommendations? Isn't the recommendation engine part of the DSS, and doesn't the customer receive its output? Based on this output, isn't he or she making some decision that is being reflected in the operational environment? In the end, we should consider the customer not only a decision maker, but perhaps the most important decision maker in the entire process.
The third group of decision makers is the organization's employees—the individuals employed by our company and responsible for defining some course of action. This can extend from the group leader all the way up to the C-level executive. As we can see with the balanced scorecard, there is a need to share strategic information at every level within the organization.