- Turning Raw Data into Information
- Examining Relationships with Scatterplots
- Understanding the Types of Variables
- Common Questions About Research Data
Common Questions About Research Data
This section presents a series of questions and answers about displaying business research data.
How many groups should I have for a histogram?
In general you should choose enough groups to show the shape of a distribution, but not so many that you lose the shape. It is partly aesthetic judgment but in general, between 5 and 15, depending on the sample size, gives a reasonable picture. Try to keep the intervals (known also as "bin widths") equal. With equal intervals the height of the bars and the area of the bars are both proportional to the number of subjects in the group. With unequal intervals this link is lost, and interpretation of the figure can be difficult.
What is the distinction between a histogram and a bar chart?
Alas, with modern graphics programs often the distinction is lost. A histogram shows the distribution of a continuous variable and, because the variable is continuous, there should be no gaps between the bars. A bar chart shows the distribution of a discrete variable or a categorical one so the resulting chart will have spaces between the bars. It is a mistake to use a bar chart to display a summary statistic such as a mean; particularly when it is accompanied by some measure of variation. It is better to use a box-whisker plot (which can be produced by most computer statistics programs.). A box-whisker plot is a type of frequency diagram that displays the most common frequencies of data in a box and extends the total range of the data with a line from the box, called the whisker.
What is the best way to display data?
The general principle should be, as far as possible, to show the original data and to try not to obscure the design of your research or business findings in the graph or chart. Within the constraints of legibility show as much information as possible. For example, when displaying the relationship between two quantitative variables, use a scatter plot rather than assigning a category to one or both of the variables.
There are many other ways to graph and chart data. If you have a statistical program such as SPSS (one of the most popular ones), you'll have instructions for creating literally a hundred or more versions or modifications to charts for displaying statistical data. I'll show a few more of these charts and graphs as I continue through the book.
The Least You Need to Know
Raw data should be ordered, summarized, and displayed in appropriate tables, charts, or graphs before applying other statistical techniques.
Some of the most common ways to display raw data are tables, pie charts, line diagrams or plots, stem and leaf plots, histograms, scatter diagrams, and polygons.
One of the most basic (and important) statistical tables is the frequency table. A great number of charts and graphs can be created from a frequency table.
It generally is easier to summarize categorical variables; for this reason, quantitative variables often are converted to categorical ones for descriptive purposes.
The categorization of quantitative variables is useful for summarizing results, but not normally good for statistical analysis.
Charts and graphs help you see trends, anomalies, and other information and relationships in your data without performing more complicated statistical analyses.