- Overview
- Statistics and Machine Learning
- The Impact of Big Data
- Supervised and Unsupervised Learning
- Linear Models and Linear Regression
- Generalized Linear Models
- Generalized Additive Models
- Logistic Regression
- Enhanced Regression
- Survival Analysis
- Decision Tree Learning
- Bayesian Methods
- Neural Networks and Deep Learning
- Support Vector Machines
- Ensemble Learning
- Automated Learning
- Summary
Summary
In this chapter, we surveyed key techniques for predictive analytics. Some techniques, such as linear regression, are mature, well understood, widely used, and broadly available in stable software tools. Other methods, such as deep learning, are quite new. Scientists still seek to understand the limits of such techniques; software implementations are rare, and they are not yet widely used in analytical applications. A third category of techniques, including automated learning, is in active development as we write this book.
As we noted at the beginning of this chapter, hundreds of predictive modeling techniques are in use, and scientists add new techniques every day. As with any technology, practitioners make small changes to address specific problems—produce more accurate models with specific types of data, run faster, work efficiently with more predictors, and so forth.
The business stakeholder need not understand every detail of the techniques used by analysts to build predictive models; instead, the stakeholder should focus on two key principles. First, in most cases, it is impossible to know in advance what technique will produce the most accurate predictions for a particular problem; the only way to discover this is to experiment with a broad spectrum of techniques. (The stakeholder should view with suspicion claims that any one method is always the best method.)
Second, the ultimate test of any predictive model is how well it predicts when placed in production. The theoretical merits and demerits of various techniques are interesting to academics; in actual applications, however, predictive power and performance are the sole measure of a model.