Designing the Multiple Regression
Suppose that you have data on 50 cars, including each car’s weight in pounds, mean speed at which it has been driven, and mean miles per gallon (MPG). You’re interested in the effect that a car’s weight and average speed have on the miles per gallon of fuel that the car achieves.
One way to approach the problem is with one analysis using Weight as the sole predictor variable and another using Speed as the sole predictor. You could choose the analysis that returns the greater R2 value as the one to use in assessing a car’s predicted MPG.
One problem with running and comparing the two analyses is that the two predictor variables, Speed and Weight, might not be independent of one another; that is, they might be correlated and therefore share variance. In that case, you can’t tell how much of the shared variance is shared by Speed and MPG and how much is shared by Weight and MPG. But it’s very likely that running two analyses and summing the R2 values will double count some amount of the variance (because it’s shared by the two predictor variables) and therefore mislead you as to the strength of the relationships.
Only in the limiting cases in which the predictor variables share no variance with one another (so they’re independent) or in which they’re perfectly correlated (so they share all their own variance) can you tell what’s going on. Of course, that sort of complete independence or dependence appears only in samples handed out in stats class. (An exception occurs when regression is used in preference to the analysis of variance and the categorical predictor variables are designed to be independent of one another.)
Whether your primary interest is in the total variance in the outcome variable that’s associated with variance in both the predictor variables, or the total amount that’s shared with each of the predictors, you’re going to need to arrange things to combine the predictors without double counting the variance shared with the outcome. Multiple regression does that for you, whether by means of matrix algebra or by means of QR decomposition, and I wouldn’t have spent so much ink on the topic if Bayesian methods didn’t do it too.