Today's Answers
How the 5 Questions are answered in today's world of corporate information varies to some degree based on the emphasis an organization places on the information. For example, organizations that place substantial priority on research and its resulting product development, inventions, or industry breakthroughs always make research data, its conclusions, and the required documentation well organized and available. Generally, these organizations have research libraries that are managed and controlled by librarians well versed in the cataloguing and retrieval of research data. Even when the research data is in a variety of formats (text documents, statistical studies, and application databases), the librarians generally know where everything is and have search mechanisms, including at least a company intranet.
Answers to the 5 Questions get a bit more complicated once a research effort results in product development. The types of data that we track and the reasons that we track them vary immensely. Using the pharmaceutical industry as an example, no longer are the scientific experiments the sole domain of the scientists, but the results now move into clinical trials and may become part of a drug's marketing package. Federal law requires specific information (for example, type of research, profiles of participants, length of trials, and specific results) to support the drug approval process and to be available for inclusion in packaging of approved drugs. Consider the twist of purpose for the research data once the resulting drug (pharmaceutical product) is manufactured and marketed. Consider also how the original research data looks when it becomes part of the marketing information collection. Finally, consider how easy or hard it would be to include this information as part of the answer to Question 1, "What data do we have?" Most pharmaceutical organizations re-create the data, or minimally copy it over with each phase of the drug development processnew purpose, new software, new database, ... new metadata?
What about organizations that do not have a research arm? Consider retailers without manufacturing, whose products are acquired from wholesalers or are packaged only within the retail organization. The eventual product line may not represent the items purchased from the supplier (wholesaler, distributor, manufacturer). The typical order-entry system contains the full array of products, with variations (colors, sizes, shapes, and so on) and pricing options (single unit, altered unit, customized unit, bulk orders, and so on). If an item is for sale, the data surrounding the sale is available ... or is it? In the domain of the order-entry system, the data a user or customer wants is available. Again, remember that information and data can be distinct and variable. If the results of this order-entry system were going to be captured for the purposes of decision support, however, some of the required information (for example, answers to the remaining four questions) would not be obvious. Furthermore, depending on the purpose of the information gathering, some information requirements may constitute information above and beyond the data itself (consider the fourth question, "How did it get there?").
What if I am a developer looking for the specifics on how to translate order-entry data to meet the needs or demands of the customer, whether an individual or a commercial customer? Generally, answers to the 5 Questions share similar qualities. They are narrowly focused and address the perspective of the individual or group that supplied the answers. Even within data management organizations, the answers to these questions reflect a narrowed scope based usually on the inability to physically address specific islands of data (depending on, for example, their physical platforms or surrounding software packages). Ironically, even data management organizations that implemented metadata repositories, data dictionaries, or information directories restricted the boundaries of coverage in most cases. These efforts, originated in the IT department, are usually not geared toward business users' requirements, access, or relevance.
Tunnel vision is also obvious when we address the answers to each question from the viewpoint of today's information implementations.
What Data Do We Have?
A true answer to this question would result in a card catalog of an organization's data organized by major subject area. In today's common implementations, business users are forced to go right to the data they seek via shortcuts on their screens. But if there is a need to go beyond that immediate scope (and there always is), generally phone calls are the only way to find out exactly what other data is available.
What Does It Mean?
If an organization is progressive and methodical, a "data dictionary" is created by documenting each data element at the time it is defined, which typically happens during a data modeling phase, but this represents an ideal situation. Many organizations purchase data processing or information management software and have no data planning or data management function associated with its installation. Likewise, the definition of the data elements within the package is left to off-the-shelf vendor values. The majority of organizations, however, do not have data dictionaries. In these scenarios, the details behind the data elements are typically tracked individually (by the business users or by the developers). Many databases start springing up, each containing a miniworld of metadata, ranging from data element names and definitions to data tracking documentation, many of which include substantial information about the source of the data. In reality, many organizations do not have official databases that track these things; as a result, many "databases" are implemented, each of which represents a particular perspective of the data's meaning, and none of them maintained from a global or central framework.
Where Is It?
The answers to this question depend directly on what "it" is that we are looking for and who we are. The answers also depend on what organizations are aware of in terms of data and how easy it is to find it. Since client/server computing took the reins off controlled data definition and storage, it is virtually impossible to locate all of it. For example, if "policy" data is not associated with a "policy" term, the only way to find all policy data is to look for it after all data has been categorized manually. Of course, computer search engines will find instances of the word policy in all kinds of data stores, but they have to be there. In today's corporate world, the answers to the question can vary depending on the type, location, and name of the data.
Type of Data
Ironically, it is fairly easy to find nonoperational data such as documents, application programs, and presentation graphics, assuming that the names are reasonable and the contents contain the main qualifier. Operational data, which is generated by transactions and the supporting processing, is typically located based on its identity. Unfortunately, the names of these data elements can range from accurate logical business names to short, programmer/DBAdeveloped identifiers to software package vendor identifiers. Based on this common problem, many "scanners" now exist that search physical data stores based on contents.
Location of Data
Where data is determines how easy it is to find. If it exists in a structured database that has its own metadata-based directory and that is accessible via a shortcut on everyone's desktop, then, sure enough, it will be found. If it is sitting in a privately developed desktop database that resulted from one business user's design and requirements, it may never show up in anyone's search. If it is on another server that can be remotely accessed and it has a common term as part of its name, it may turn up in an access query. Finally, if it is a specific type of file with its own access procedure, as populated in a standard model (see Figure 3), it will be located. Only you know how likely these situations are in your organization, but, unfortunately, not everyone who looks for data understands that, even though data exists, it may never be found.
Data Association/Data Naming
Of course, if the data you are looking for concerns customer addresses and it is named "Party Location," how likely is it that you will find it? Without predicting odds, organizations geared toward data location and fulfillment usually offer sets of standard names and aliases as well as standard subject-area categories to help identify data that may not have obvious names. Again, if the rules exist but they are not followed, or people are not even aware of them, the effort results in more harm than good. One of the worst things that can happen is the return of a subset of information when the receiver is not aware that he or she is seeing a subset.
How Did It Get There?
The answer to this question can be extremely simple or very broad. What one will never know, however, is whether a simple answer should really be broader. The advent of data warehousing began to put much emphasis on the "source" of data, yet, in most major corporations, "data webs" have existed for quite some time, in many cases unbeknownst to the data analyst or documenter. The answer to this question is often restricted to the amount of information capable of fitting into a predetermined and presized Source field. If a particular data element has more than one source, it is up to the documentation specialist to determine how to report all of them or, in most cases, to choose the most recent source of that data element. In neither scenario does the information do any good for the data recipient because the true beginnings of the data are never documented. As a result, the business analyst will never know, for example, that "Gross Income" started from the Accounting System and was then massaged three times through three different accounting spreadsheets before it reached the report being documented
How Do I Get It (Go Get It for Me!)?
Because information gathered in response to this question is often either supplied by a technical person or is a "freebie" (that is, software provides the capability), this is probably the only question that appears to be answered accurately on all accounts. As mentioned earlier, operating systems are typically based on an underlying model that provides point-and-click access to anything that has been defined consistently and in conformance with certain rules (see Figure 3).
Ironically, however, the more accurate this answer is, the less likely it is to be of service. This is the reason the question is answered to begin with. Historically, if someone looked for and actually found data, the way it was found was important. In today's world, however, someone can receive data without any idea how it was accessed. In fact, the route to the data is often "hidden" or "included" as part of the software being used to locate the data. Only when we cannot find the data or, more often, have no choice as to what data to use or what software to assist with the location process, do we always make a point of learning how to access it. More and more organizations and vendors are focusing on documenting the path to and method of obtaining data. Nevertheless, in most scenarios, the resulting documentation represents an isolated subset of the 5 Questions or all of the five questions for an isolated group of data (see Figure 4).
Figure 4 Viewing the Five Questions in isolation.