Using Excel with Mixed and Nested Models
In the first article in this two part series, Nested and Random Factors in ANOVA, I discussed two special types of factors in the analysis of variance:
- Nested factors that have levels that do not appear at every level of another factor. An example is hospitals that provide, or do not provide, a particular treatment. Hospital is nested within Treatment. Factors whose levels do appear at every level of another factor are termed crossed factors.
- Random factors that comprise levels that are considered random selections from a larger population. If Hospital is a factor in an experiment, it is very likely that the experimenter wants to generalize the findings to other hospitals not included in the experiment. In that case, Hospital is a random factor. Factors whose levels exhaust the levels of interest, such as Male patients versus Female patients, are termed fixed factors.
The Data Analysis Add-in's ANOVA Tools
Excel's Data Analysis add-in includes three tools that perform ANOVAs:
- ANOVA: Single Factor
- ANOVA: Two-Factor With Replication
- ANOVA: Two-Factor Without Replication
The ANOVA: Single Factor Tool
The single factor tool can be a quick and handy way of running a one-way ANOVA, especially if you're primarily interested in F ratios and probability levels. If you want the richer analysis available from the least squares approach to ANOVA, you're better off with LINEST() or the Data Analysis add-in's Regression tool.
The ANOVA: Two-Factor Without Replication Tool
When you have two factors and one observation per cell, you might think that the add-in's Two-Factor Without Replication tool is the method of choice. This tool is, in fact, just a means of analyzing a repeated measures design. As such, it requires that your data set meets the compound symmetry assumption, which implies homogeneous variances and covariances. This assumption is rarely met, and you're much better off running a multivariate ANOVA if you have a true repeated measures design or any sort of randomized block design.
The ANOVA: Two-Factor With Replication Tool
The ANOVA: Two-Factor With Replication tool can be useful if you have exactly two factors and if each cell contains the same number of observations.
Additionally, the two-factor tool assumes that both factors are fixed and no nesting is involved. However, it's a fairly simple matter to modify the tool's results so that it treats one of the factors as nested, or as random and crossed with a fixed factor.
This article shows you how to do so, but first have a look at the results of an analysis in which both factors are fixed and crossed. See Figure 1.
Figure 1 Two aspects of the data in Figure 1 are important to note: the way to lay out the input data and the denominator of the F ratios
Data Layout
The first aspect is the layout of the input data in columns A through C. There are two levels of the Treatment factor whose labels appear in cells B1 and C1. (If there were three or more levels of the Treatment factor, they could occupy columns D, E, and so on.)
In this case, there are also two levels of the Sex factor. If some factor such as Ethnicity, rather than Sex, were under investigation, additional levels could be identified in subsequent rows.
You don't have to supply the labels in Column A or in Row 1. You could leave that column and row blank. But you have to include them in the input range that you identify in the tool's dialog box. See Figure 2.
Figure 2 Notice that there is no Labels check box in the dialog box. The tool uses any labels in the descriptive section only
When you choose the ANOVA: Two-Factor With Replication tool from the Data Analysis dialog box, you see the dialog box shown in Figure 2. As the data is laid out in Figure 1, you should enter A1:C21 in the Input Range edit box. That is, you don't need to supply the labels in Column A or Row 1, but you do have to include a row and a column in which the labels would be if you had supplied them.
Also note the edit box for Number of Rows per Sample in Figure 2. There's no provision for specifying, say, 9 rows for Sample 1 and 11 rows for Sample 2. Each "sample" is required to have the same number of observations. In this way, the tool avoids dealing with the (fairly common) situation of an unequal number of observations per design cell.
Calculating the F Ratios
Figure 1 also shows that in a two-factor ANOVA with fixed factors, the F ratios for the main effects and the interaction all use the Mean Square Within as the denominator. The F ratios in the range I4:I6 are each the result of dividing the associated mean square in H4:H6 by the mean square in H7. This is the correct approach with two crossed and fixed factors.
Adapting the Data Analysis Tool for a Random Factor
Although the ANOVA: Two-Factor With Replication tool assumes that both factors are fixed, you can easily adapt it to account for one random and one fixed factor. It can be important to do so because treating a random factor as fixed can mislead you regarding the significance of the factor that is actually fixed.
Understanding the Labels
Figure 3 shows an example of what can happen.
Figure 3 The Hospital factor is random, and the Method factor is fixed
Figure 3 (like Figure 1) shows that the Two-Factor With Replication tool always uses the labels "Samples" and "Columns" to represent, respectively, the factors that occupy the rows and the columns of your input data. This usage isn't particularly helpful—there's no reason, for instance, why the columns should not be thought of as representing samples. (In fact, the very use of the term "Samples" in the output suggests that the tool is treating the factor as a random factor rather than as a fixed factor. Don't be misled: The tool initially treats both factors as fixed.)
Nevertheless, suppose that you want to think of the Hospital factor as random; that is, you want to generalize your findings to all hospitals, not just to the hospitals from which you took your data. And you want to regard the three methods as representing a fixed factor, such as three methods commonly used in hospitals to treat a particular disease. Your objective in running an ANOVA is to determine whether the methods result in reliably different outcomes. Put another way, you want to quantify the statistical significance of the differences in the mean values of the three methods.
Choosing the Denominator
Figure 3 shows two ANOVA tables, both based on the input data in the range A1:K7. The first ANOVA table, in the range A10:G16, shows the results that the ANOVA tool calculates. Note in particular that the F ratio for the "Sample" factor, in cell E12, is 9.007. With 2 and 30 degrees of freedom, that ratio is significant at the .001 level (see cell F12).
That F ratio of 9.007 is calculated by dividing the "Sample" mean square of 71.292 by the Within mean square of 7.916 (see cells D12 and D15). But the Within mean square is the wrong denominator for this F ratio.
When you have two factors in an ANOVA, one of which is fixed and one of which is random, the F ratio for the fixed factor is the ratio of the mean square for that factor to the mean square for the interaction effect.
I haven't the room to get into the theory of the expected values of mean squares here, but I can state that the expected value of the mean square for the fixed factor in this situation includes the variance for the interaction effect as well as the variance for the fixed effect. Therefore, the proper denominator for the fixed effect's F ratio is the mean square for the interaction, not the mean square within.
Notice the second ANOVA table, in the range A18:G24. I have changed the F ratio for the Sample factor in the second table so that it divides the mean square for the Sample factor by the mean square for the interaction. The result is a much smaller F ratio, shown in cell E20 as 1.699. With 2 and 18 degrees of freedom (because the df for the interaction is 18 while the df for Within is 30), that F ratio is not significant at even the 0.2 level. Few would regard that as evidence of a reliable effect for the Method factor.
The ANOVA tool did not get the calculations wrong. It simply treats both factors as fixed when one of them is actually random (and unfortunately the documentation doesn't warn you of that fact).
I have also modified the entry in cell F20 to this formula:
=F.DIST.RT(E20,C20,C22)
The tools found in the Data Analysis add-in generally return fixed values rather than worksheet formulas, and that's true of the two-factor ANOVA tool. So simply changing the denominator of the F ratio in cell E20 does not result in a new and accurate assessment of the probability of the revised F ratio. After entering the proper formula for the F ratio in E20:
=D20/D22
I entered the F.DIST.RT function in F20 to obtain the area in the right tail of the central F distribution with 2 and 18 degrees of freedom: in this case, more than 20% of that area lies to the right of the obtained F ratio.
The example shown in Figure 3 shows how you can be misled into thinking that a fixed factor involves a significant difference when in fact it does not. All it takes is failing to take account of the presence of a random factor in the design, treating it instead as fixed. That's the case with the Hospital factor in this example. The ANOVA tool treats it as fixed and there is a consequence for Method, the factor that actually is fixed.
But the effect can go the other way. It's entirely possible to get a non-significant finding for the fixed factor if you treat what's actually a random factor as fixed. Then you might decide that (using this example) the Method makes no difference, when in fact it does. It depends on the relative sizes of the mean square for the interaction and the mean square within. (The degrees of freedom for the selected denominator also exert an effect on the probability of the F ratio.)
Adapting the Data Analysis Tool for a Nested Factor
A similar change to the ANOVA tool's results enables you to deal with a nested factor in a two-factor design. See Figure 4 for the layout issues involved.
Figure 4 This layout C2:K10 shows the true conceptual layout. The range C12:G20 shows the same data laid out for analysis
Data Layout for a Nested Design
Figure 4 shows two ways of laying out the data for a two-factor ANOVA with one factor nested in another. The range C2:K10 shows how the data is actually collected. Levels B1 through B4 of Factor B are found only in level A1 of Factor A. Levels B5 through B8 of Factor B are found only in level A2 of Factor A. This is a true nested design (sometimes termed a hierarchical design).
But although the layout in C2:K10 of Figure 4 is conceptually accurate, it can't be analyzed by the Excel's two-factor ANOVA tool. If you identify C2:K10 as the Input Range in the tool's dialog box (see Figure 2), then Excel displays an error message regarding non-numeric data when you click OK.
The solution is to rearrange the data as shown in C12:G20 in Figure 4. We'll temporarily pretend that we have a fully crossed design, with only four levels of Factor B that cross both levels of Factor A. If you run the ANOVA: Two-Factor With Replication tool on the data in C12:G20, you get the results shown in Figure 5.
Figure 5 The range A13:G19 contains the actual ANOVA tool's output
A nested design such as this one doesn't have an interaction term in the traditional sense. A fully crossed design with two factors has an interaction term that addresses the question of whether one factor operates differently at different levels of the other factor.
But in a nested design, that question can't be addressed. You don't have all the levels of each factor represented at all levels of the other factor, so you can't assess whether one level of the nested factor acts differently across levels of the other factor: The requisite data just isn't there. Instead, the best you can do is to isolate variability in the results according to its apparent source.
The correct analysis of a two-factor nested design appears in the range A21:G27 in Figure 5. There are two points to note: obtaining sums of squares and mean squares for the nested factor, and getting the proper F ratio for the other factor.
Getting the Sums of Squares
In the kind of situation discussed here, two factors including one nested factor, the sum of squares for the nested factor is easily computed by adding the sum of squares for the nested factor to the sum of squares for what the ANOVA tool thinks is the interaction term. So, in Figure 5, the sum of squares for the "B within A" term in cell B24 is the total of cells B15 and B16.
The same is true of the degrees of freedom for the "B within A" factor. Total the degrees of freedom for the nested factor and the interaction. The degrees of freedom for "B within A" in Figure 5 is the total of cells C15 and
C16.
Then the mean square for "B within A" is obtained by dividing B24 by C24. The F ratio for the nested factor is the mean square for that factor divided by the mean square within.
Calculating the F Ratio for the Nesting Factor
As is the case for the mixed model discussed earlier in this article, the proper denominator for the nesting factor's F ratio is the mean square for the nested variable. In this example, Factor B is nested within Factor A. Therefore, the F ratio for Factor A uses the mean square for Factor B as its denominator.
In Figure 5, note that the F ratio for Factor A in cell E23 is the result of dividing cell D23 by D24. The ANOVA tool, which assumes that the factors are fully crossed, correctly uses the mean square within as the denominator for the F ratio of each effect (both factors and their interaction).
But that's the wrong denominator for the nesting factor. Instead, you should use the mean square for the nested factor as the F ratio's denominator. And in this example, doing so returns an F ratio that is not significant. In contrast, treating the nested factor as crossed results in an F ratio of 5.27, which with 1 and 24 degrees of freedom is significant at the .03 level.
In Summary
The Data Analysis add-in for Excel offers an ANOVA: Two-Factor With Replication tool that is designed for use with two-factor designs in which the factors are both fixed and fully crossed. An additional requirement is that all design cells have the same number of observations.
Despite these restrictions, the tool can be a handy way to assess two-factor designs with one random and one fixed factor (a mixed model) and designs in which one factor is nested inside the other.
In each case, it's necessary to adjust the F ratio of the fixed factor or, when there's a nested factor, the F ratio of the nesting factor. This is an easy adjustment given the results that the ANOVA tool writes to the worksheet.
In the case of the nested factor, it's also necessary to combine the sums of squares and the degrees of freedom for the nested factor with the interaction term.
These simple modifications extend the applicability of the ANOVA tool. They can also help protect the user against erroneously concluding that a significant effect is non-significant, and also against the reverse error of concluding that a non-significant effect is significant.