Business Analytics Domain
As shown in Figure 1-1, the domain of business analytics covers four major areas of study: databases and data warehouses, descriptive analytics, predictive analytics, and prescriptive analytics. Whereas data structures are used to effectively store and efficiently retrieve information, descriptive analytics can be used to report the past. Whereas predictive analytics uses past data to create models that predict the future, prescriptive analytics utilizes optimization, heuristics, or simulation models that can specify optimal solutions and prescribe the best courses of action.
Figure 1-1 Overview of business analytics
Databases and Data Warehouses
Databases and data warehouses serve as the foundation of business analytics. Every business analytics process starts by storing the data appropriately in operational databases and ensuring data integrity. The data analyst must understand the principles of database design and implementation throughout all its steps: conceptual, logical, and physical modeling. The most common design of databases is known as relational database modeling. Relational databases are distributed throughout organizations and may belong to different departments. They may be stored in different platforms and may use incompatible data formats.
Data warehouses consolidate information gathered from disparate sources and provide access of customized information to business users so they can make better decisions. It is important that data stored in disparate sources is loaded into the data warehouse in a consistent format. In addition, data warehouses combine structured, semi structured, and nonstructured data. In addition, extract-transform-load (ETL) processes can also be used to refresh and update operational databases into the warehouse target objects. Data warehouses are then available for queries through multidimensional objects such as cubes.
After the information is captured and stored in operational databases or data warehouses, the data analyst may perform online analytical processing, create business reports, visualize data, and produce operational business intelligence. Structured Query Language (SQL) is a programming language that can be used to create databases, store and update data in these databases, and retrieve information from them.
The nature of data capturing and processing has changed dramatically in the era of Big Data. Nonrelational, distributed, open-source, and horizontally scalable (abbreviated as NoSQL) databases have emerged and are used in real-time web applications.15 Whereas relational databases consist of related tables, records, and fields, NoSQL databases contain nonstructured data in the form of key-values, graphs, or documents.
Descriptive Analytics
Descriptive analytics is used to quantitatively describe the main features of organizational data. Descriptive analytics aims to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. Some of the common tools used in descriptive statistics include sampling, mean, mode, median, standard deviation, range, variance, stem and leaf diagram, histogram, interquartile range, quartiles, and frequency distributions. The results of descriptive statistics are often displayed via graphics/charts, tables, and summary statistics such as single numbers.
Suppose that Fandango, the leading online ticket seller for movie theaters, wants to investigate the movie preferences of its customers during the past year. Fandango sells millions of tickets to approximately 20,000 movie theaters across the United States.16 Information about customers, movie theaters, ticket sales, and show times are automatically captured and stored in structured databases. Then, periodically, this information is extracted, transformed, and loaded into data warehouses or data marts, which mostly reside in distributed servers. Fandango data scientists will then use descriptive analytics. For example, using a sample of movie titles, the analysts can investigate the correlations among total sales for different movies. Using a sample of moviegoers, they can calculate the average ticket sales for a week, the most popular movie, distribution of customers among movie genres, the busiest hours of the day in the movie theater, age distribution of moviegoers, gender distribution, and so on. This type of data analysis helps Fandango set ticket prices, offer discounts for certain movies or show times, and assign show times of the same movie in different theaters.
Predictive Analytics
Whereas descriptive statistics are considered a straightforward presentation of facts, predictive analytics uses statistical modeling to draw conclusions and predict future behavior based on the assumption that what has happened in the past will continue to happen in the future. Some of the common tools used in descriptive statistics include cluster analysis, association analysis, multiple regression, logistic regression, decision tree methods, neural networks, and text mining. Forecasting tools, such as time series and causal relationships, are also classified as predictive analytics.
How does Fandango know to send e-mails to its members with discount offers for a specific movie on a specific day? Predictive analytics tools can crunch terabytes and terabytes of data to determine that while John likes science fiction movies, he has not seen the latest sci-fi movie, which has been in the theaters since last Friday. How does a grocery store checkout system generate valuable coupons just in time and on the back of the printed receipt? Julie’s favorite whole-grain cereal was missing from the shopping basket that day. The computer matches Julie’s past cereal history to ongoing promotions in the store, and right there, on the spot, Julie receives a coupon for the whole-grain cereal that she will most likely buy.
Prescriptive Analytics
Some of the most common models used in prescriptive statistics include linear programming, sensitivity analysis, integer programming, goal programming, nonlinear programming, and simulation modeling. Practitioners use prescriptive analytics to make decisions based on data. For example, continuing with the Fandango example, the prescriptive tools allow for ticket price offerings to change every hour. Fandango has learned when the most desirable movie times are by sifting through millions and millions of show times instantaneously. This information is then used to set an optimal price at any given time, based on the supply of show times and the demand for movie tickets, thus maximizing profits.
Prescriptive analytics can help the movie industry to ensure that their pricing structures are optimally set to contribute to bottom-line results. Similarly, prescriptive analytics can help airline industries maximize their revenues by making sure that the highest prices are charged during the highest times of demand as well as by lowering the prices when the demand is low. The combination of Big Data with prescriptive tools allows the airlines to adopt pricing policies that go beyond traditional peak, off-peak, or shoulder seasons. Changes are dynamic and in real time; they can be implemented within the days of the week or even the hours of the day. Prescriptive analytics are the engine of today’s real-time business intelligence.