A Longitudinal View of Analytics
Because of its recent buzz and popularity, many people are asking if analytics is something new. The short answer is “it is not,” at least not for the true meaning of what analytics stands for. One can find references to corporate analytics as far back as 1940s during the World War II era when more effective methods were needed to maximize output using only limited resources. Most of the optimization and simulation techniques were developed then. Analytics (or, as it was called, analytical techniques) have been used in business since the early days of time-and-motion studies that were initiated by Frederick Winslow Taylor in the late 19th century. Henry Ford measured pacing of the assembly line, which led to mass production initiatives. But analytics began to command more attention in the late 1960s when computers were used in decision support systems. Since then, analytics have evolved with the development of enterprise resource planning (ERP) systems, data warehouses, and a variety of other hardware and software tools and applications.
The timeline depicted in Figure 1.3 shows the terminology used to describe analytics in the past sixty years. During the early days of analytics, prior to the 1970s, there was very little data, often obtained from the domain experts using manual processes (interviews and surveys) to build mathematical or knowledge-based models to solve constraint optimization problems. The idea was to do the best with limited resources. These decision support models were generally named as operations research (OR). The problems that were too complex to solve optimally (using linear or nonlinear mathematical programming techniques) were tacked with heuristic methods like simulation models.
FIGURE 1.3 A longitudinal view of the evolution of analytics.
In the 1970s, in addition to matured OR models that were now being used in many industries and government systems, a new and exciting line of models emerged: rule-based expert systems (ES). These systems were promising to capture the experts’ knowledge in a machine-processable form (for example, a collection of if-then rules) so that they could be used for consultation in much the same way one would use domain experts to identify a structured problem and to prescribe the most probable solution. That way, the scarce expertise can be made available to whomever wherever they would need it using an “intelligent” decision support system. During the 1970s, businesses created routine reports to help/inform decision-makers (managers) about what had happened in the previous day, week, month, or quarter. Although it was useful to know what had happened in the past, managers needed more than what was available: a variety of reports at different levels of granularity to better understand and address changing needs and challenges of the business.
The 1980s saw a significant change in the way organizations captured their business-related data. The old practice of having multiple disjointed information systems that were tailored to capture transactional data of the organizational unit/function (accounting, marketing and sales, finance, manufacturing) left its place to integrated enterprise-level information systems that we commonly call ERP systems today. The old, mostly sequential, and nonstandardized data representation schemas left their places to relational database management (RDBM) systems. These systems made it possible to improve capturing, storing, and relating organizational data fields to one another while significantly reducing the replication of information. The need to have RDBM and ERP systems emerged when data integrity and consistency became an issue, significantly hindering the effectiveness of business practices. With ERP, all the data from every corner of the enterprise is collected and integrated into a consistent schema so that every part of the organization would have access to the single version of the truth when and where they needed it. In addition to the emergence of ERP systems, or perhaps because of these systems, business reporting became an on-demand as-needed business practice. That way, the decision-makers can, when they need to or want to, create a specialized report to investigate organizational problems and opportunities.
In the 1990s, the need for having more versatile reporting led to executive information systems (a decision support system that was designed and developed specifically for executives and their decision-making needs). These systems were designed as graphical dashboards and scorecards so that they could serve as visually appealing displays while focusing on the most important factors for decision-makers to keep track of—key performance indicators. To make this highly versatile reporting possible while maintaining the transactional integrity of the business information systems intact, they had to create a middle data tier as a repository to specifically support business reporting and decision-making. This new tier is called data warehouse (DW). In a short time, most large to medium-sized businesses adopted data warehousing as their platform for enterprise-wide decision-making. The dashboards and scorecards were getting their data from DW, and by doing so, were not hindering the efficiency of the business transaction systems—mostly referred to as ERP systems.
In the 2000s, these DW-driven decision support systems were named as business intelligence systems. As the longitudinal data accumulated in the DWs, so did the capabilities of hardware and software to keep up with the rapidly changing and evolving needs of the decision-makers. As a necessity of the globalized competitive marketplace, decision-makers needed the most current information in a digestible form to address business problems and to take advantage of market opportunities in a timely manner. Because the data in a DW is updated periodically, it does not reflect the latest information. To elevate this information latency problem, DW vendors developed a system to update the data more frequently, which led to coining the term “real-time data warehousing” or, more realistically, “right-time data warehousing,” which differs from the former by adopting a data refreshing policy based on the needed freshness of the data items. (Not all data items need to be refreshed in real time.) Because the data collected in a DW was large and feature rich, the emerging computational trends like data mining and text mining have become popular for “mining” the corporate data to “discover” new and useful knowledge nuggets to improve business processes and practices. With the increasing volumes and verities of data, the need for more storage and more processing power emerged. While large corporations had the means to tackle the problem, small to medium-sized companies looked for financially more manageable business models. This need led to service-oriented architecture and software and infrastructure as a service (IaaS)-type analytics business models. That way, smaller companies had access to analytics capabilities on an as-needed basis and paid only for what they used, as opposed to investing in financially prohibitive hardware and software resources.
In the 2010s, we have seen and still are seeing yet another paradigm shift in the way the data is captured and used. Largely attributed to the widespread use of the Internet, new data generation mediums have emerged. Of all the new data sources, including RFID tags, digital energy meters, clickstream Web logs, smart home devices, and wearable health monitoring equipment, perhaps the most interesting and challenging one is the social network/media data. Even though it is rich in information content, analysis of such unstructured data sources poses significant challenges to computational systems from both a software as well as a hardware perspective. Recently, the term “Big Data” was coined to highlight these challenges the new data streams have brought upon us. Many advancements in both hardware—massively parallel processing with huge computational memory and highly parallel multi-processor computing systems—as well as software/algorithms—Hadoop with MapReduce and NoSQL—have been developed to address the challenges of Big Data.
What the next decade will bring, what new terms will be used to name analytics, are hard to predict. The time between new paradigm shifts in information systems and particularly in analytics has been shrinking, and the trend of such will continue for the foreseeable future. Today, the reality is that even though analytics is not new, the explosion in its popularity is. With the recent explosion of Big Data, means to collect and store this data, and intuitive software tools, data and data-driven insight have become more accessible to business professionals than ever before. Therefore, in the midst of global competition, there is a huge opportunity to make better managerial decisions using data and analytics to increase revenue while decreasing cost by building better products, improving customer experience, catching fraud before it happens, and improving customer engagement through targeting and customization—all with the power of analytics and data. More and more companies are now preparing/schooling their employees with the know-how of business analytics to drive effectiveness and efficiency in their day-to-day decision-making processes.