- 12.1 What Is “Tidy” Data?
- 12.2 From Columns to Rows: gather()
- 12.3 From Rows to Columns: spread()
- 12.4 tidyr in Action: Exploring Educational Statistics
12.3 From Rows to Columns: spread()
It is also possible to transform a data table from long format into wide format—that is, to spread out the prices into multiple columns. Thus, while the gather() function collects multiple features into two columns, the spread() function creates multiple features from two existing columns. For example, you can take the long format data shown in Table 12.2 and spread it out so that each observation is a band, as in Table 12.3:
<# Reshape long data (Table 12.2), spreading prices out among multiple features price_by_band <- spread( band_data_long, # data frame to spread from key = city, # column indicating where to get new feature names value = price # column indicating where to get new feature values )
Table 12.3 A “wide” data set of concert ticket prices for a set of bands. Each observation (i.e., unit of analysis) is a band, and each feature is the ticket price in a given city.
band |
Denver |
Minneapolis |
Portland |
Seattle |
billy_strings |
25 |
15 |
25 |
15 |
fruition |
40 |
20 |
50 |
30 |
greensky_bluegrass |
20 |
30 |
40 |
40 |
trampled_by_turtles |
40 |
100 |
20 |
30 |
The spread() function takes arguments similar to those passed to the gather() function, but applies them in the opposite direction. In this case, the key and value arguments are where to get the column names and values, respectively. The spread() function will create a new column for each unique value in the provided key column, with values taken from the value feature. In the preceding example, the new column names (e.g., "Denver", "Minneapolis") were taken from the city feature in the long format table, and the values for those columns were taken from the price feature. This process is illustrated in Figure 12.2.

Figure 12.2 The spread() function spreads out a single column into multiple columns. It creates a new column for each unique value in the provided key column (city). The values in each new column will be populated with the provided value column (price).
By combining gather() and spread(), you can effectively change the “shape” of your data and what concept is represented by an observation.