Data Reshaping in R
- 12.1. cbind and rbind
- 12.2. Joins
- 12.3. reshape2
- 12.4. Conclusion
As noted in Chapter 11, manipulating the data takes a great deal of effort before serious analysis can begin. In this chapter we will consider when the data needs to be rearranged from column oriented to row oriented (or the opposite) and when the data are in multiple, separate sets and need to be combined into one.
There are base functions to accomplish these tasks but we will focus on those in plyr, reshape2 and data.table.
12.1. cbind and rbind
The simplest case is when we have two datasets with either identical columns (both the number of and names) or the same number of rows. In this case, either rbind or cbind work great.
As a first trivial example, we create two simple data.frames by combining a few vectors with cbind, and then stack them using rbind.
># make two vectors and combine them as columns in a data.frame
> sport <-c
("Hockey"
,"Baseball"
,"Football"
) > league <-c
("NHL"
,"MLB"
,"NFL"
) > trophy <-c
("Stanley Cup"
,"Commissioner
'
s Trophy"
, +"Vince Lombardi Trophy"
) > trophies1 <-cbind
(sport, league, trophy) ># make another data.frame using data.frame()
> trophies2 <-data.frame
(sport=c
("Basketball"
,"Golf"
), + league=c
("NBA"
,"PGA"
), + trophy=c
("Larry O
'
Brien Championship Trophy"
, +"Wanamaker Trophy"
), + stringsAsFactors=FALSE) ># combine them into one data.frame with rbind
> trophies <-rbind
(trophies1, trophies2)
Both cbind and rbind can take multiple arguments to combine an arbitrary number of objects. Note that it is possible to assign new column names to vectors in cbind.
> cbind
(Sport = sport, Association = league, Prize = trophy)
Sport Association Prize
[1,] "Hockey" "NHL" "Stanley Cup"
[2,] "Baseball" "MLB" "Commissioner's Trophy"
[3,] "Football" "NFL" "Vince Lombardi Trophy"