Exploring Excel's Functions, Part 6: TTEST() Function
If you have taken a statistics course, you probably already know all about the Student’s T-Test calculation. As an historical note, the test is so named not because it is designed for students but because it was invented by a fellow named William Sealey Gosset—which makes no sense at all until you realize that Gosset wrote under the name Student, so the statistical test he invented bears that name. The T-Test was initially used to handle small samples for quality control at the Guinness brewery in Ireland.
Even if you have not studied statistics, you can still use the T-Test. But what’s it for? An example will help illustrate.
How Does It Work?
Suppose that you are watching a chess match and you notice that the five members of the Swedish Woman’s Team are all taller than the five members of the Australian Woman’s Team. You wonder: Are Swedes on average taller than Australians? The most direct approach is to measure the height of all women in Sweden and Australian and figure out the averages. You would have a definitive answer and would not need any statistical tests.
Of course, it isn’t feasible—you cannot measure the entire populations of all women in the two countries! Instead, you must rely on a sample, a randomly selected group from each country. In this example, the samples are the five women on each of the teams. Given that the average height of the five Swedish women is greater then the average height of the five Australian women, there are two possibilities:
- Overall, Swedish and Australian women do not differ in height. Only by chance did the Swedish team end up with taller members.
- Overall, Swedish women are taller than Australian women. This fact is reflected in the heights of the team members.
Here’s where the T-Test comes in. You can plug in the individual heights of the 10 people in the two samples, and the TTEST() function will tell you the probability that the difference between the two samples arose purely by chance. If that probability is small enough, it is safe to conclude that there is a real difference between the populations—in other words, that the difference is significant. Generally, a probability of 0.05 (5%) is considered the cutoff, but in some applications a smaller value might be required.
Valid use of the T-Test is based on the assumption that the population data are normally distributed, which means that the data, when plotted, would form the standard bell curve. This assumption is valid for the vast majority of data.
The TTEST() function takes four arguments:
TTEST(range1, range2, tails, type)
- Range1 and range2 are the two worksheet ranges in which the data from the two samples are located. They can (but do not have to) have the same number of data points.
- Tails should be the value 2 for a two-tailed test. You can also use the value 1 for a one-tailed test but that is a specialized use that I will not cover here.
- Type should usually be the value 2. I’ll explain the Type argument in more detail in the text.