- Turning Raw Data into Information
- Examining Relationships with Scatterplots
- Understanding the Types of Variables
- Common Questions About Research Data
Examining Relationships with Scatterplots
One of the most important steps in examining the relationship between two variables is to create a scatterplot. A scatterplot is simply a graph that plots a score for one variable (for example, attitude toward your product) against a score on a second variable (for example, income level). They are used to examine any general trends in the relationship between two variables.
If scores on one variable tend to increase with correspondingly high scores of the second variable, a positive relationship is said to exist. If high scores on one variable are associated with low scores on the other, a negative relationship is apparent in the scatter plot. (In Chapter 15, "Recognizing the Relationships of Business Data," you'll learn to calculate these type of relationships, called correlations between variables.)
The extent to which the dots in a scatterplot cluster together in the form of a clear directional line (up, down, sideways, and so forth) indicates the strength of the relationship. (again, we'll be talking more about these types of correlation relationships in Chapters 15, "Recognizing the Relationships of Business Data," 17, "Are These Customers the Same or Different?" and 18, "Getting Results from ANOVA." Scatter-plots with dots that are spread apart almost randomly represent a weak relationship between the variables. The following figure shows a scatterplot for attitudes toward your product compared to the salary levels of the subjects being interviewed.
Figure 3.12 Scatterplot with generally positive direction: As the variable on the left gets higher the variable on the bottom also gets larger.
Statistical Wisdom
Factual science may collect statistics and make charts. But its predictions are, as has been well said, but past history reverse.
John Dewey, American philosopher and educator
Stem and Leaf Plots
The following raw data often is referred to as an array, which is a list of numerical data. It would be nice to order this data and know what the range of accuracy is in this particular manufacturing application. One way to display this easily is to use a stem and leaf plot, shown in the diagrams that follow for this array of data for fifteen pistons manufactured in a Porsche plant. A stem and leaf plot looks somewhat like a tree; hence the name for the chart.
Data Accuracy in 1/1000 millimeters for fifteen pistons manufactured for Porsche engines:
2.6, 0.6, 1.1, 0.1, 0.4, 2.0, 1.3, 0.8, 1.3, 1.2, 1.9, 3.2, 1.7, 2.2, 1.9
The following diagram is known as a stem and leaf plot as they come, which means the data is entered in the order it appears in the array (as shown in the preceding example).
Unordered Stem and Leaf Plot
Stem |
Leaf |
||||||
---|---|---|---|---|---|---|---|
0 |
6 |
1 |
4 |
8 |
|||
1 |
1 |
3 |
3 |
2 |
9 |
7 |
9 |
2 |
6 |
0 |
2 |
||||
3 |
2 |
To create the type of plot shown in the preceding diagram you must abbreviate the observations to two significant digits. In the case of the grinding accuracy data, the digit to the left of the decimal point is the stem; the digit to the right is the leaf. First write the stems in order down the page; then work along the data set, writing the leaves down as they come. Thus, for the first data point, we write a 6 opposite the 0 stem. These are given in the preceding figure. You then order the leaves, as in the following example.
Ordered Stem and Leaf Plot
Stem |
Leaf |
||||||
---|---|---|---|---|---|---|---|
0 |
1 |
4 |
6 |
8 |
|||
1 |
1 |
2 |
3 |
3 |
7 |
9 |
9 |
2 |
0 |
2 |
6 |
||||
3 |
2 |
The advantage of first setting the figures out in order of size and not simply feeding them straight from notes into a calculator or computer program (for example, to find their average) is that the relation of each to the next can be looked at. Is there a steady progression, a noteworthy hump, a considerable gap? Simple inspection can disclose irregularities.
Furthermore, a glance at the figures gives information on their range. The smallest value is 0.1; the largest is 3.2 (based on 1/1000 of a millimeter). Of course, if you don't have time to lay out a large dataset into a stem and leaf plot, most statistics computer programs (such as the popular program, SPSS) will do this for you in a snap.