- Introduction
- Determining the Stats You Need
- Entering the Formula
- Plotting the Distribution
Determining the Stats You Need
The first step in creating a frequency distribution is to decide the size and number of bins you’ll use. There are no rules because what you need depends on the nature of your data. If the data values range from 1–100, you might decide to use 10 bins, each of which is 10 units in size. You could also use 20 bins of size 5 or 5 bins of size 20—what you choose depends on the nature of the data and the kind of information you’re trying to pull out.
You also may want to consider having "less than" and/or "greater than" bins at either end of your bin range. These bins are used to catch "outliers"—data points that are below or above certain values, such as the number of people who lost less than 5 pounds. This bin would catch someone who lost 4 pounds as well as that one person who gained 20 pounds.
Next, you enter a list of bin values in the worksheet. Each value represents the upper limit of a bin. To illustrate this point, I’ll use data on the number of defects found in new cars sold over a certain period, shown in column B of Figure 1. The values extend beyond the bottom of the window, but that’s no concern. As Figure 1 shows, the values range from a low of 0 (you can’t have fewer defects than that, after all) to the mid-teens. Therefore, bins that are 2 units wide seem appropriate, with 6 bins covering the range 0 defects to 12 defects. We’ll also add a "more than 12 defects" bin at the upper end. We don’t need a "less than" bin because we know that the lowest data value is 0. Cells D3:D9 in Figure 1 show the appropriate bin values.
Figure 1 The bin values list for calculating the frequency distribution is entered in cells D3:D9.
Why is the last bin value 100? It could have been any value greater than 12 and also greater than the largest data value. A quick eyeballing of the data revealed no three-digit values, so 100 was an easy choice.