- Calculating the Cumulative Density Function
- Using Power to Determine Sample Size
Increasing Power by Means of Sample Size
So you might as well consider increasing your sample size, even though an increase in observations usually entails greater costs. Using the layout shown in Figure 1, you can use Excel's Solver to tell you what sample size results in statistical power of, say, 90%.
To do so, you will need to have Solver installed with Excel (Solver is an add-in that typically comes with Excel on the installation disc or downloaded installation file). It's straightforward to install Solver, and the instructions to do so are found in many places, both online and in print (for example, in Chapter 2 of Statistical Analysis: Microsoft Excel 2010, Que Publishing, 2011).
With Solver installed, take these steps:
- If necessary, select the worksheet that contains the values and formulas, shown in Figure 1.
- Click the Data tab on Excel's Ribbon.
- Click Solver in the Data tab's Analysis group.
- On the Solver dialog box, select C12 as the Set Objective cell.
- Click the Value Of option button and enter 0.9 in the associated edit box.
- Enter C2 in the By Changing Variable Cells edit box.
- Click Solve.
Solver tries out different values for the sample size per group until it finds a sample size that satisfies your criterion of 90% power. In this case, you need 16 observations per group to obtain power of 90.52%. See Figure 2.
Figure 2 Other things held equal, in this case doubling the sample size from 8 to 16 roughly doubles the power from 47% to 90%.
The next three sections detail a few points about this analysis:
Sum of Squares Between and Within
The sum of squares between, or SSB, is 160 in Figure 1 and 320 in Figure 2. This is because I have used this formula:
=20 * C2
to calculate SSB. In a design that has an equal number of observations per group, the general formula for SSB is:
With this data, the squared deviations of the group means from the grand mean equals 20:
(2.42-4.66)2 + (3.29-4.66)2 + (8.28-4.66)2 = 20
With a sample size of 8 per group, that results in an SSB of 8 * 20 = 160, and of 16 * 20 = 320 with 16 per group. The mean square within does not change, because it is the average of the within group variances and as such is expected to change only negligibly along with a change in the number of observations per group.
V2, or Degrees of Freedom Within
The value for degrees of freedom within changes, along with the change in the number of observations per group. This is the formula used to determine DFW:
=((C8+1)*C2)-C8-1
That is:
- Get the number of groups. Add 1 to Degrees of Freedom Between (DFB) in cell C8.
- Multiply the result by the number of observations per group in C2. The result is the total number of observations.
- Subtract DFB, and subtract 1 from the result.
The formula, therefore, returns N-k-1, or the total number of observations less DFB less 1, or Degrees of Freedom Within. Along with alpha and DFB, this value is needed to calculate the value of the critical F.
Critical F value
The critical F value is returned by this formula:
=F.INV(1-C4,V_1,V_2)
The value in C4 is alpha, the probability of rejecting a true null hypothesis. V_1 is DFB, which does not change in response to a change in sample size. V_2 is DFW, which does change as sample size changes. Therefore, you normally expect the critical F value to change as you modify the number of observations per sample.
Final Thoughts
This series of four articles has discussed quite briefly the concept of statistical power, and how you can increase it by making changes in the design of t-tests in two-group experiments. With that as background, we moved on to the statistical power of F-tests, including the noncentrality parameter λ. The third and fourth articles showed you how to visualize what happens to the F distribution as you modify sample size and the magnitude of λ. By invoking Excel's Solver, you can determine the sample size you need to reach a particular level of statistical power for a given λ.
The formulas presented in these articles do not take into account, for either t-tests or F-tests, the effect of unequal group sizes on either alpha or power. When unequal group sizes are accompanied by unequal group variances, your actual alpha and beta error rates can be either larger or smaller than you expect. This is called the Behrens-Fisher problem, and if you're not familiar with it, you might want to consult a beginning-to-intermediate text on statistical analysis.
Handling the effects of unequal sample sizes on power calculation is fairly straightforward. It's not difficult to modify the formulas given in this series of articles in order to take account of different numbers of observations per group. You can generally follow the modifications that are given in most statistical textbooks.