In the Beginning...
In 1998, a group from IBM's Services organization came to our Research group with a problem. IBM Global Services manages the computer helpdesk operations of hundreds of companies. In doing so, they document millions of problem tickets—records of each call that are typed in by the helpdesk operator each time an operator has an interaction with a customer. Here is what a typical problem ticket looks like:
1836853 User calling in with WORD BASIC error when opening files in word. Had user delete NORMAL.DOT and had her reenter Word, she was fine at that point. 00:04:17 ducar May 2:07:05:656PM
Imagine millions of these sitting in databases. There they could be indexed, searched, sorted, and counted. But this vast data collection could not be used to answer the following simple question: What kinds of problems are we seeing at the helpdesk this month? If the data could be leveraged to do this analysis, then some of the more frequent tasks could potentially be automated, thus significantly reducing costs.
So why was it so hard to answer this question with the data they had? The reason is that the data is unstructured. There is no set vocabulary or language of fixed terms used to describe each problem. Instead, the operator describes the customer issue in ordinary everyday language...as they would describe it to a peer at the helpdesk operations center. As in normal conversation, there is no consistency of word choice or sentence structure or grammar or punctuation or spelling in describing problems. So the same problem called in on different days to different operators might result in a very different problem ticket description. This kind of unstructured information in free-form text is what we refer to as "talk." It is simply the way humans have been communicating with each other for thousands of years, and it's the most prevalent kind of data to be found in the world. Potentially, it's also the most valuable, because hidden inside the talk is little bits and pieces of important information that, if aggregated and summarized, could communicate actionable intelligence about how any business is running, how its customers and employees perceive it, what is going right and what is going wrong, and possibly solutions to the most pressing problems the business faces. These are examples of the gold that is waiting to be discovered if we can only "Mine the Talk."
And so with this challenge began the journey that culminated in this book.