Rotating Factors
How do we get from the original loadings to the second set? The original components are derived in a way that prevents them from sharing variance. Remember that the first component gets all the variance that's available to it, the second component gets all the remaining variance that's available to it, and so on with subsequent components. This approach ensures that components have no shared variance, and therefore they are uncorrelated with one another.
A consequence of this approach is that the components are what factor analysts call orthogonal: Plotted on a two-dimensional chart, the components represent axes that are at right angles to each other. Here's another way to view this aspect of principal components: If you calculate scores on two (or more) factors for each record, and correlate scores on Factor 1 with scores on Factor 2, the correlation would be 0.0 (see Figure 2). The factors represent entirely different things.
Figure 2 You get the same 0.0 correlation with both original and rotated components.
Another consequence of the way in which the principal components are extracted is that each variable (here, Murder, Assault, and so on) has the highest correlation possible with each component. In fact, the loadings in Figure 1—as I've mentioned, they're actually correlations—are measures of the distance between the variables and the components. The higher the loading, the closer the component comes to the variable. The result is relatively high loadings for most or all of the variables on the first component, which in turn makes it difficult to interpret the meaning of that component and any components that are subsequently extracted.
Now suppose that, after the component extraction is complete, the components could be rotated while the measured variables remain in place. It might be possible to get a more interpretable pattern of loadings on the components. (At this point in the process, we start referring to factors instead of components.) Some variables would be closer to the rotated factors, and some would be farther away, creating a clearer pattern of loadings.
In the process of rotating the components, you have to observe some rules, or else you wind up with arbitrary and subjective decisions about how the factors behave vis-à-vis the variables. In the mid-1900s, a statistician named Louis Thurstone laid down these rules for what is now termed simple structure. Not all rotation methods follow the rules for simple structure, but if you know which rules you're following and which you're not, then you can understand your results much more clearly.
I won't discuss those rules in any real detail here, but one rule of simple structure as implemented by the Varimax rotation is that the components must remain orthogonal to each other. They start out at right angles to one another, and they must maintain that orientation while rotating.
It's a real simplification, but you might think of two orthogonal factors as two spokes in a bicycle wheel, at right angles to one another. As you steer the bicycle, the wheel turns right and left, and it also tilts right and left as you lean one way or another. Those turns and tilts rotate the wheel with respect to its stationary surroundings, but the spokes maintain their original orientation to one another. Similarly, the orthogonal rotation process in factor analysis maintains the relationships between the factors as it adjusts the factors' relationships to the stationary variables.
Another rule that governs the Varimax rotation of the factors under the rules of simple structure is that the variables' total communalities must not change. A variable's communality with respect to a given factor is the square of the variable's loading on that factor. Because the loadings are correlations, the squares of the loadings are percentages of variance.
If you extract and rotate as many factors as there are variables, then a variable's total communality is 1.0, because all the factors account for 100% of its variance. If you don't extract and rotate as many factors as there are variables, the extracted factors will account only for a portion of the available variance, but that portion is generally quite high.
Let's look again at the original worksheet (see Figure 3). The sum of the squared loadings appears in column F. (Keep in mind that the loadings are correlations, and therefore their squares represent percentages of shared variance.) For example, 90.542% of the variance in the Murder variable is accounted for by Factors 1 and 2, regardless of whether the factors are original components or rotated factors. What differs is the way that 90.542% is allocated across the two factors. The rotated loadings make possible a Property Crimes versus Personal Crimes interpretation, but the amount of explained variance in all seven variables remains the same as with the original, unrotated components.
Figure 3 Rotation adjusts the loadings and often clarifies the meanings of the factors.
What happens when you chart the states on the rotated components? The scatter chart in Figure 4 shows where each state falls on the Property Crimes factor and the Personal Crimes factor.
Figure 4 I reversed the directions of the axes to put higher x-axis scores on the left and higher y-axis scores at the bottom.
The chart in Figure 4 is much clearer than the chart of the unrotated components we saw in part 1. In the current chart, we've identified the two factors as representing two different types of crimes: Property Crimes and Personal Crimes.
The tendencies I noted in the chart in part 1 are much more pronounced in Figure 4 above, because the rotation of the axes altered the pattern of loadings on the factors.
These trends are now apparent:
- Western states (CA, NV, AZ, CO, HI) have high scores on the Property Crimes factor.
- Eastern states (WV, MS, NC, NH, ME) have low scores on the Property Crimes factor.
- Southern states (TX, GA, FL, LA, SC, NC) have high scores on the Personal Crimes factor.
- Northern states (ND, RI, WI, MN, MA, VT) have low scores on the Personal Crimes factor.
In sum, you probably suspected from the start that the raw crime rate data would define two factors: Property Crimes and Personal Crimes. But only by extracting the components and rotating them to clarify the loadings can you score the states on those factors—and find that the regions of the U.S. cluster according to those components.