Reshaping and Cleaning Common Data
Save 35% off the list price* of the related book or multi-format eBook (EPUB + MOBI + PDF) with discount code ARTICLE.
* See informit.com/terms
6.1 Introduction
As mentioned in Chapter 4, Hadley Wickham,1 one of the more prominent members of the R community, introduced the concept of tidy data in a paper in the Journal of Statistical Software.2 Tidy data is a framework to structure data sets so they can be easily analyzed and visualized. It can be thought of as a goal one should aim for when cleaning data. Once you understand what tidy data is, that knowledge will make your data analysis, visualization, and collection much easier.
What is tidy data? Hadley Wickham’s paper defines it as meeting the following criteria:
Each row is an observation.
Each column is a variable.
Each type of observational unit forms a table.
This chapter goes through the various ways to tidy data as identified in Wickham’s paper.