Characterizing Big Data
One of the most frequent ways in which Big Data is defined is in terms of the 3 V’s: volume, variety, and velocity. As the term implies, anything falling under the umbrella of Big Data is large in size. Some have taken this to mean anything that is too voluminous to be stored on a desktop computer. Multiple students have expressed interest in studying Big Data, which I’ve generally taken to mean that they are interested in using software that can handle more rows than a single spreadsheet in Microsoft Excel can accommodate. (Excel has a current limit of just more than one million rows—a vast improvement to the 65,536 rows of the previous version of the software.) In addition to the size of the data being stored, Big Data often comprises a variety of formats. On top of the quantifiable data (or “structured data”), Big Data encompasses “unstructured data” such as text comments, images, and multimedia file types.
Perhaps the most important of the 3 V’s is velocity. It is the velocity with which the data are collected and must be processed that can separate Big Data problems from those that simply involve large quantities of data. One illustration of problems that require data to be collected and processed rapidly is real-time marketing. Whether it is an auction to determine the order in which advertisements will appear alongside search results or targeted messages based on an individual’s mobile browsing history and current location, such problems require that data be processed rapidly.
Walt Disney World maintains an underground bunker called the Disney Operational Command Center to ensure that the theme park operates smoothly. Its staff may attempt to increase the speed with which visitors are moving through the queue of a ride if they find that it is too long, or dispatch costumed employees to entertain guests while they wait.8 If visitors are moving through rides more efficiently and having a more pleasant experience at the park, they’re apt to have more time during which they can visit stores and restaurants in the amusement parks, which provides Disney with additional revenue from its visitors. Phil Holmes, vice president of the Magic Kingdom, noted that “if we can increase the average number of shop or restaurant visits, that’s a huge win for us.”
While the majority of cases discussed in this book focus on marketing applications, perhaps one of the most important applications of Big Data today is security. Although Disney’s objective may be to minimize wait times or maximize visitors’ expenditures, imagine if we could apply similar approaches to law enforcement? Consider the FBI’s development of its Next Generation Identification (NGI) system, which will explore the use of facial recognition tools.9 Such a system could be used after crimes have been committed, comparing the footage from security cameras to databases that have been compiled previously. If such technology could be deployed more rapidly, it has the potential to reduce the resources that need to be committed to pursuing offenders. For such tasks, time is of the essence. Faster processing of the available data can contribute not only to cost reductions but also to improvements in public safety.
Although the 3 V’s are common to problems that fall under the auspices of Big Data, they ignore at least two other critical factors. First, there are issues as to the veracity of the data. No one questions that there is a lot of data available, but organizations trying to cut through the noise and identify the signal must ask themselves if they can trust the different data streams available. When decisions are being based on the results of data analysis, the findings are meaningless if the data on which they are based is biased in some way.
This has been one of the concerns raised by marketers about the potential use of social media data. Although such data are generally available, do the comments scraped off the Web reflect the thoughts of a brand’s entire customer base? Neglecting to account for known biases in social media data could contribute to problems such as overestimating the importance of an issue to consumers or failing to capture shifts in brand sentiment.10
Second, and more important, is the value of the data. Many organizations talk about having a Big Data strategy. If they’re referring to a plan to warehouse and access data relevant to their organization, there’s nothing wrong with this statement. In fact, more organizations would probably benefit from having a well-thought-out strategy that integrates the IT function with the appropriate business processes. The problem is, though, that they’re often not referring to how data will be stored and made available to users. Instead, they’re using the term “Big Data” as a crutch. Rather than thinking through what they are trying to achieve and gathering data that are appropriate to addressing those goals, they believe that they have a foolproof strategy: track “everything.”