Tools for Causal Inference
It’s important to balance stakeholder expectations when you scope a project, and good causal inference can take time. The Kellehers hope to empower data scientists to make informed decisions and not to accept purely correlative results lightly.
Develop the tools you need to do causal inference including how machine learning models can be useful to get more general model specifications, and the better you can predict an outcome using a machine learning model, the better you can remove bias from an observational causal effect estimate.
Save 35% off the list price* of the related book or multi-format eBook (EPUB + MOBI + PDF) with discount code ARTICLE.
* See informit.com/terms
13.1 Introduction
We’ve introduced a couple of machine-learning algorithms and suggested that they can be used to produce clear, interpretable results. You’ve seen that logistic regression coefficients can be used to say how much more likely an outcome will occur in conjunection with a feature (for binary features) or how much more likely an outcome is to occur per unit increase in a variable (for real-valued features). We’d like to make stronger statements. We’d like to say “If you increase a variable by a unit, then it will have the effect of making an outcome more likely.”
These two interpretations of a regression coefficient are so similar on the surface that you may have to read them a few times to take away the meaning. The key is that in the first case, we’re describing what usually happens in a system that we observe. In the second case, we’re saying what will happen if we intervene in that system and disrupt it from its normal operation.
After we go through an example, we’ll build up the mathematical and conceptual machinery to describe interventions. We’ll cover how to go from a Bayesian network describing observational data to one that describes the effects of an intervention. We’ll go through some classic approaches to estimating the effects of interventions, and finally we’ll explain how to use machine-learning estimators to estimate the effects of interventions.
If you imagine a binary outcome, such as “I’m late for work,” you can imagine some features that might vary with it. Bad weather can cause you to be late for work. Bad weather can also cause you to wear rain boots. Days when you’re wearing rain boots, then, are days when you’re more likely be late for work. If you look at the correlation between the binary feature “wearing rain boots” and the outcome “I’m late for work,” you’ll find a positive relationship. It’s nonsense, of course, to say that wearing rain boots causes you to be late for work. It’s just a proxy for bad weather. You’d never recommend a policy of “You shouldn’t wear rain boots, so you’ll be late for work less often.” That would be reasonable only if “wearing rain boots” was causally related to “being late for work.” As an intervention to prevent lateness, not wearing rain boots doesn’t make any sense.
In this chapter, you’ll learn the difference between correlative (rain boots and lateness) and causal (rain and lateness) relationships. We’ll discuss the gold standard for establishing causality: an experiment. We’ll also cover some methods to discover causal relationships in cases when you’re not able to run an experiment, which happens often in realistic settings.