Home > Articles > Data

Getting Started with Data Science: Hypothetically Speaking

By Murtaza Haider
Feb 2, 2016

📄 Contents

␡

⎙ Print

< Back Page 11 of 13 Next >

This chapter is from the book 

Getting Started with Data Science: Making Sense of Data with Analytics

Learn More Buy

Analysis of Variance

Analysis of variance, ANOVA, is the prescribed method of comparing means across groups of three or more. The null hypothesis in this case states that the average values do not differ across the groups. The alternative hypothesis states that at least one mean value is different from the rest.

I use the F-test for ANOVA. If the probability (p-value) associated with the F-test is greater than the threshold value, which is usually .05 for the 95% confidence level, we fail to reject the null hypothesis. In instances where the probability value for the F-test is less than .05, we reject the null hypothesis. In such instances, we conclude that at least one mean value differs from the rest.

I will repeat the comparison of means for the three age groups using the ANOVA test. The R code and the resulting output (see Figure 6.38) follow.

Figure 6.38 ANOVA output for influence of age on teaching evaluations

Note that the value reported under Pr(>F) is 0.0998, which is greater than 0.05. Thus, we fail to reject the null hypothesis and conclude that the teaching evaluations do not differ by age groups.

Let us test the average teaching evaluations for a discretized variable for beauty, which in raw form is a continuous variable. I convert the continuous variable into three categories namely: low beauty, average looking, and good looking. The R code and the resulting output (see Figure 6.39) follow.

x$f.beauty<-cut(x$beauty, breaks=3)
x$f.beauty<-factor(x$f.beauty, labels=c("low beauty", "average
            looking", "good looking"))
cbind(mean.eval=tapply(x$eval,x$f.beauty,mean),
            observations=table(x$f.beauty))
summary(aov(eval~f.beauty, data=x))

Figure 6.39 ANOVA output for influence of beauty on teaching evaluations

The probability value associated with the F-test is 0.0276, which is less than .05, our threshold value. I therefore reject the null hypothesis and conclude that teaching evaluations differ by students’ perception of instructors’ appearance.

< Back Page 11 of 13 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address

Getting Started with Data Science: Hypothetically Speaking

This chapter is from the book

This chapter is from the book

This chapter is from the book 

Analysis of Variance

InformIT Promotional Mailings & Special Offers