Sample Regression Analysis
To lay the groundwork for a comparison of Bayesian regression analysis with traditional least squares, Figure 6.2 shows the basics of a very small analysis, rendered in Excel. It includes
Values in B2:D6, which are used as inputs to Excel’s LINEST function.
Values in the range C2:D6, which contains two predictor variables in columns C and D.
Values in cells B2:B6, which contain a predicted variable.
The LINEST function, in the range F2:H6, which contains and displays the results of the function. For example, the contents of each cell in F2:H6 are computed with the dynamic formula that’s repeated here:
=LINEST(B2:B6,C2:D6,,TRUE)
After entering the formula in F2:H6 of Figure 6.2, I copied and saved it as the result values in F8:H12. That is because I want you to be able to use Excel’s Solver, or to change the regression coefficients manually, so you can compare the results of that change with the original results. By doing so, you can demonstrate for yourself what happens when you try to maximize regression’s accuracy by adjusting the returned coefficients and intercept. (You cannot change just part of an array formula; it’s all or nothing at all. But if you have saved the results of LINEST as values, you are free to change any of those values as you please.)
The regression equation’s predicted value for the first of the five records is shown in cell L2 of Figure 6.3. It is calculated with this equation:
=$H$2+$G$2*C2+$F$2*D2
which is then copied and pasted into L3:L6 of Figure 6.3.
Figure 6.3 A slight change in the input data or in a regression coefficient can result in a dramatic change in the results.
I also entered the formulas in L2:L6 as values in J2:J6 by first copying the formulas, and then pasting them into J2:J6, choosing one of the Paste Values options.
Finally, I entered formulas for the sum of the squared deviations in J8:J12 and L8:L12, and their sums in J14:L14.
Open the Excel workbook for this chapter and activate the worksheet named Fig 6.3. Verify that the sums of the squared deviations are both 0.415.
Now, change the value of one or both the regression coefficients in cells F2, G2, or H2. Make your entry a numeric value. Notice that the values displayed in cells J14 and L14 no longer equal one another. While you’re at it, you might note that the value in L14 is now larger than the value shown in cell J14.
The value in cell J14 is unchanged from its original value. That’s why I saved the results of the LINEST function in F2:H6—so that it would be unaffected by your selection of a different value for the regression coefficient in cell F2, G2, or H2. Either way, the sum of the squared deviations in L14 has increased above its value when the LINEST results were undisturbed. And that means the regression equation is not doing as accurate a job of forecasting outcomes as when you left its coefficients alone.
What’s the point of all this? It’s that traditional, least squares techniques for regression analysis do not necessarily tell you what you need or want to know about the relationship between an outcome variable and one or more predictor variables. Of course, you don’t want to ignore the traditional point estimate that’s returned by the traditional R2 calculations, but neither should you ignore the results of calculations that return an R2—and associated statistics—that don’t quite meet or exceed the criterion of maximized R2.
To keep some flexibility in your analytic tools, it’s a good idea to view the results of a regression analysis through both a frequentist and a Bayesian lens. I’ve already discussed some of the issues surrounding the frequentist approach in this chapter—in particular, the worksheet function LINEST—so let’s now take a look first at regression methods that rely heavily on matrix algebra, and then on one alternative from the Bayesian toolbox, R’s quap function.