---
title: "Chapter 4 Brazil table"
author: Murtaza Haider
date: June 20, 2015
---
Subsetting to Brazil
Downloading data from the website. The download command downloads data in the working directory.
Data are downloaded, loaded, and names of fields are read from the file:
setInternet2(TRUE) # solution for https files
download.file("https://sites.google.com/site/econometriks/docs/brazil.RData", "brazil2.RData")
load("brazil2.Rdata")
#load("F:/Research/All_else_being_equal/Chapters/Ch_4_serving tables/R files/Brazil/brazil.RData")
names(brazil)
## [1] "country" "satisfied" "internet" "email"
## [5] "cell" "social" "gender" "age"
## [9] "education" "income" "ethnicity" "pol.party"
## [13] "weight" "new.cell" "new.email" "new.internet"
## [17] "new.social" "new.satis" "new.inc" "new.age"
There are 20 variables in the file.
Never seen more serene vistas than Rio. A city in continuous celebration. Growing market, identified by Goldman Sachs in BRICS.
Focus: Internet and social media.
Data: Brazil
Question: What has age, sex, education, income, ethnicity, and political affiliation to do with the use of Internet and social media?
Illustration of tables Data set has been cleaned for the following: * Subsetted * Column names changed to more common sense explanations * new derived variables.
Start with summary statistics. Display the actual output and then show it should be printed. Using Excel and HTML and the Word files.
Starting with the first variable: country.
R specific quirk with factors
Fixing the country variable with refactoring.
library(xtable)
library(htmlTable)
Uncorrected results
tab<-xtable(cbind("country"=table(brazil$country)))
print(tab, type="html")
country | |
---|---|
argentina | 0 |
brazil | 1000 |
britain | 0 |
china | 0 |
egypt | 0 |
france | 0 |
germany | 0 |
india | 0 |
indonesia | 0 |
japan | 0 |
jordan | 0 |
kenya | 0 |
lebanon | 0 |
mexico | 0 |
nigeria | 0 |
pakistan | 0 |
poland | 0 |
russia | 0 |
south korea | 0 |
spain | 0 |
turkey | 0 |
us | 0 |
Corrected results
brazil$country<-factor(brazil$country)
tab<-xtable(cbind("country"=table(brazil$country)))
print(tab, type="html")
country | |
---|---|
brazil | 1000 |
tab1<-(cbind("internet"=xtabs(weight ~ new.internet),
"email"=xtabs(weight~new.email),
"social media"=xtabs(weight~new.social),
"cell"=xtabs(weight~new.cell)
))
tab2<-xtable(tab1)
print(tab2, type="html")
internet | social media | cell | ||
---|---|---|---|---|
no | 569.99 | 639.78 | 104.75 | 271.49 |
yes | 427.57 | 359.38 | 327.90 | 728.13 |
tab3<- xtable(addmargins((prop.table(tab1,2)*100),1))
print(tab3, type="html")
internet | social media | cell | ||
---|---|---|---|---|
no | 57.14 | 64.03 | 24.21 | 27.16 |
yes | 42.86 | 35.97 | 75.79 | 72.84 |
Sum | 100.00 | 100.00 | 100.00 | 100.00 |
See the new command with digits=0 in the xtable command
Table 4.7
tab1<-(cbind("internet"=xtabs(weight ~ new.internet),
"email"=xtabs(weight~new.email),
"social media"=xtabs(weight~new.social),
"cell"=xtabs(weight~new.cell)
))
htmlTable(txtRound(tab1,0), cgroup = c("internet & digital media users"),
n.cgroup = c(4), rowlabel = "User status")
internet & digital media users | ||||
---|---|---|---|---|
User status | internet | social media | cell | |
no | 570 | 640 | 105 | 271 |
yes | 428 | 359 | 328 | 728 |
# tab3<- xtable(addmargins((prop.table(tab1,2)*100),1), digits=0)
# print(tab3, type="html")
With SAPPLY and weighted data set
Table 4.8
tab1<-sapply(brazil[,14:17],function(x,y) xtabs(y ~ x), y=brazil[,13])
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
new.cell | new.email | new.internet | new.social | |
---|---|---|---|---|
no | 271 | 640 | 570 | 105 |
yes | 728 | 359 | 428 | 328 |
tab3<- txtRound(addmargins((prop.table(tab1,2)*100),1),0)
htmlTable(tab3, rowlabel = "User status")
User status | new.cell | new.email | new.internet | new.social |
---|---|---|---|---|
no | 27 | 64 | 57 | 24 |
yes | 73 | 36 | 43 | 76 |
Sum | 100 | 100 | 100 | 100 |
Using the recoded variables for income, gender, education and others.
First the tabulating explanatory variables:
Gender
tab1<-cbind("Gender"=xtabs(weight ~ gender))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
Gender | |
---|---|
male | 503 |
female | 497 |
tab1<-cbind("Gender"=xtabs(weight ~ gender+new.internet))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
no | yes | |
---|---|---|
male | 274 | 229 |
female | 296 | 199 |
tab3<-xtable(addmargins((prop.table(tab1,1)*100),2),digits=1)
print(tab3, type="html")
no | yes | Sum | |
---|---|---|---|
male | 54.5 | 45.5 | 100.0 |
female | 59.9 | 40.1 | 100.0 |
Table 4.9 and 4.10
tab1<-cbind(xtabs(weight ~ gender+new.internet))
tab2<-txtRound(tab1, 0)
htmlTable(tab2,
cgroup = c("internet users"), n.cgroup = c(2),
rgroup=c("Gender"), n.rgroup=c(2))
internet users | ||
---|---|---|
no | yes | |
Gender | ||
male | 274 | 229 |
female | 296 | 199 |
tab3<-addmargins((prop.table(tab1,1)*100),2)
tab4<-txtRound(tab3,1)
htmlTable(tab4,
cgroup = c("internet users"), n.cgroup = c(3),
rowlabel = "Gender")
internet users | |||
---|---|---|---|
Gender | no | yes | Sum |
male | 54.5 | 45.5 | 100.0 |
female | 59.9 | 40.1 | 100.0 |
tab1<-cbind("Education"=xtabs(weight ~ education))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
Education | |
---|---|
no formal education | 27 |
incomplete primary school | 96 |
complete primary school | 176 |
incomplete secondary school | 112 |
complete secondary school | 127 |
incomplete tertiary school | 66 |
complete tertiary school | 269 |
incomplete university | 52 |
complete university | 77 |
don’t know | 0 |
refused | 0 |
tab1<-cbind("Education"=xtabs(weight ~ education+new.internet))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
no | yes | |
---|---|---|
no formal education | 27 | 0 |
incomplete primary school | 93 | 3 |
complete primary school | 160 | 16 |
incomplete secondary school | 82 | 29 |
complete secondary school | 70 | 56 |
incomplete tertiary school | 27 | 39 |
complete tertiary school | 98 | 171 |
incomplete university | 7 | 45 |
complete university | 7 | 70 |
don’t know | 0 | 0 |
refused | 0 | 0 |
Table 4.11
tab1<-cbind("Education"=xtabs(weight ~ education+new.internet))
tab2<-txtRound(addmargins((prop.table(tab1,1)*100),2),1)
htmlTable(tab2, cgroup = c("internet users"), n.cgroup = c(3),
rowlabel = "Education attainment")
internet users | |||
---|---|---|---|
Education attainment | no | yes | Sum |
no formal education | 100.0 | 0.0 | 100.0 |
incomplete primary school | 96.8 | 3.2 | 100.0 |
complete primary school | 91.0 | 9.0 | 100.0 |
incomplete secondary school | 74.0 | 26.0 | 100.0 |
complete secondary school | 55.8 | 44.2 | 100.0 |
incomplete tertiary school | 40.6 | 59.4 | 100.0 |
complete tertiary school | 36.4 | 63.6 | 100.0 |
incomplete university | 13.0 | 87.0 | 100.0 |
complete university | 9.3 | 90.7 | 100.0 |
don’t know | |||
refused |
Income
tab1<-cbind("income"=xtabs(weight ~ income))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
income | |
---|---|
up to r$ 200,00 | 28 |
from r$ 201,00 to 400,00 | 60 |
from 401,00 to 600,00 | 167 |
from 601,00 to 1000,00 | 225 |
from 1001,00 to 1400,00 | 176 |
from 1401,00 to 2000,00 | 139 |
from 2001,00 to 3000,00 | 75 |
from 3001,00 to 4000,00 | 30 |
from 4001,00 to 6000,00 | 20 |
from 6001,00 to 10.000,00 | 10 |
from 10.001,00 to 15.000,00 | 1 |
over 15.000,00 | 0 |
don’t know | 22 |
refused | 48 |
tab1<-cbind("income"=xtabs(weight ~ income+new.internet))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
no | yes | |
---|---|---|
up to r$ 200,00 | 23 | 5 |
from r$ 201,00 to 400,00 | 43 | 16 |
from 401,00 to 600,00 | 135 | 32 |
from 601,00 to 1000,00 | 148 | 76 |
from 1001,00 to 1400,00 | 99 | 76 |
from 1401,00 to 2000,00 | 59 | 80 |
from 2001,00 to 3000,00 | 20 | 56 |
from 3001,00 to 4000,00 | 9 | 21 |
from 4001,00 to 6000,00 | 4 | 16 |
from 6001,00 to 10.000,00 | 1 | 9 |
from 10.001,00 to 15.000,00 | 0 | 1 |
over 15.000,00 | 0 | 0 |
don’t know | 12 | 10 |
refused | 18 | 30 |
tab3<-xtable(addmargins((prop.table(tab1,1)*100),2),digits=1)
print(tab3, type="html")
no | yes | Sum | |
---|---|---|---|
up to r$ 200,00 | 83.8 | 16.2 | 100.0 |
from r$ 201,00 to 400,00 | 73.0 | 27.0 | 100.0 |
from 401,00 to 600,00 | 81.0 | 19.0 | 100.0 |
from 601,00 to 1000,00 | 66.0 | 34.0 | 100.0 |
from 1001,00 to 1400,00 | 56.5 | 43.5 | 100.0 |
from 1401,00 to 2000,00 | 42.4 | 57.6 | 100.0 |
from 2001,00 to 3000,00 | 26.0 | 74.0 | 100.0 |
from 3001,00 to 4000,00 | 30.4 | 69.6 | 100.0 |
from 4001,00 to 6000,00 | 18.6 | 81.4 | 100.0 |
from 6001,00 to 10.000,00 | 6.7 | 93.3 | 100.0 |
from 10.001,00 to 15.000,00 | 0.0 | 100.0 | 100.0 |
over 15.000,00 | 0.0 | 100.0 | 100.0 |
don’t know | 53.6 | 46.4 | 100.0 |
refused | 37.4 | 62.6 | 100.0 |
Table 4.12
tab1<-cbind("Education"=xtabs(weight ~ income+new.internet))
tab2<-txtRound(addmargins((prop.table(tab1,1)*100),2),1)
htmlTable(tab2, cgroup = c("internet users"), n.cgroup = c(3),
rowlabel = "Income level")
internet users | |||
---|---|---|---|
Income level | no | yes | Sum |
up to r$ 200,00 | 83.8 | 16.2 | 100.0 |
from r$ 201,00 to 400,00 | 73.0 | 27.0 | 100.0 |
from 401,00 to 600,00 | 81.0 | 19.0 | 100.0 |
from 601,00 to 1000,00 | 66.0 | 34.0 | 100.0 |
from 1001,00 to 1400,00 | 56.5 | 43.5 | 100.0 |
from 1401,00 to 2000,00 | 42.4 | 57.6 | 100.0 |
from 2001,00 to 3000,00 | 26.0 | 74.0 | 100.0 |
from 3001,00 to 4000,00 | 30.4 | 69.6 | 100.0 |
from 4001,00 to 6000,00 | 18.6 | 81.4 | 100.0 |
from 6001,00 to 10.000,00 | 6.7 | 93.3 | 100.0 |
from 10.001,00 to 15.000,00 | 0.0 | 100.0 | 100.0 |
over 15.000,00 | 0.0 | 100.0 | 100.0 |
don’t know | 53.6 | 46.4 | 100.0 |
refused | 37.4 | 62.6 | 100.0 |
Age
tab1<-cbind("age"=xtabs(weight ~ new.age))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
age | |
---|---|
middle aged | 311 |
seniors | 252 |
young adults | 231 |
youth | 207 |
tab1<-cbind("age"=xtabs(weight ~ new.age+new.internet))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
no | yes | |
---|---|---|
middle aged | 188 | 122 |
seniors | 212 | 38 |
young adults | 110 | 121 |
youth | 60 | 147 |
tab3<-xtable(addmargins((prop.table(tab1,1)*100),2),digits=1)
print(tab3, type="html")
no | yes | Sum | |
---|---|---|---|
middle aged | 60.7 | 39.3 | 100.0 |
seniors | 85.0 | 15.0 | 100.0 |
young adults | 47.5 | 52.5 | 100.0 |
youth | 28.8 | 71.2 | 100.0 |
Table 4.13
tab1<-cbind("Age cohort"=xtabs(weight ~ new.age+new.internet))
tab2<-txtRound(addmargins((prop.table(tab1,1)*100),2),1)
htmlTable(tab2, cgroup = c("internet users"), n.cgroup = c(3),
rowlabel = "Age cohorts")
internet users | |||
---|---|---|---|
Age cohorts | no | yes | Sum |
middle aged | 60.7 | 39.3 | 100.0 |
seniors | 85.0 | 15.0 | 100.0 |
young adults | 47.5 | 52.5 | 100.0 |
youth | 28.8 | 71.2 | 100.0 |
Age + Gender
round(ftable(xtabs(weight~gender+new.age+new.internet)),0)
## new.internet no yes
## gender new.age
## male middle aged 85 68
## seniors 106 18
## young adults 52 65
## youth 31 78
## female middle aged 103 53
## seniors 107 20
## young adults 58 56
## youth 29 69
GLM.1 –> unweighted model GLM.2 –> weighted model
GLM.1 <- glm(new.internet ~ gender + new.age + new.cell + new.inc, family=binomial(logit), data=brazil)
GLM.2 <- glm(new.internet ~ gender + new.age + new.cell + new.inc, family=binomial(logit),
weights=weight, data=brazil)
## Warning: non-integer #successes in a binomial glm!
logit1<-xtable(summary(GLM.1))
print(logit1, type="html")
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | 0.0940 | 0.3513 | 0.27 | 0.7892 |
genderfemale | -0.2467 | 0.1738 | -1.42 | 0.1559 |
new.ageseniors | -1.4793 | 0.2529 | -5.85 | 0.0000 |
new.ageyoung adults | 0.5988 | 0.2209 | 2.71 | 0.0067 |
new.ageyouth | 1.6487 | 0.2541 | 6.49 | 0.0000 |
new.cellyes | 1.6515 | 0.2411 | 6.85 | 0.0000 |
new.inclow | -2.9382 | 0.3186 | -9.22 | 0.0000 |
new.incmedium | -1.9228 | 0.2736 | -7.03 | 0.0000 |
logit2<-xtable(summary(GLM.2))
print(logit2, type="html")
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | 0.1594 | 0.3373 | 0.47 | 0.6364 |
genderfemale | -0.1748 | 0.1645 | -1.06 | 0.2878 |
new.ageseniors | -1.3409 | 0.2627 | -5.11 | 0.0000 |
new.ageyoung adults | 0.5527 | 0.2067 | 2.67 | 0.0075 |
new.ageyouth | 1.5403 | 0.2244 | 6.86 | 0.0000 |
new.cellyes | 1.6031 | 0.2288 | 7.01 | 0.0000 |
new.inclow | -2.9232 | 0.3101 | -9.43 | 0.0000 |
new.incmedium | -1.9146 | 0.2696 | -7.10 | 0.0000 |
expon<-cbind("unweighted model"=exp(coef(GLM.1)), "weighted model"=exp(coef(GLM.2)))
print(xtable(expon,digits=3), type="html")
unweighted model | weighted model | |
---|---|---|
(Intercept) | 1.099 | 1.173 |
genderfemale | 0.781 | 0.840 |
new.ageseniors | 0.228 | 0.262 |
new.ageyoung adults | 1.820 | 1.738 |
new.ageyouth | 5.200 | 4.666 |
new.cellyes | 5.215 | 4.969 |
new.inclow | 0.053 | 0.054 |
new.incmedium | 0.146 | 0.147 |
GLM.1 –> unweighted model GLM.2 –> weighted model
GLM.1 <- glm(new.internet ~ gender + new.age + new.cell + new.inc, family=binomial(logit), data=brazil)
GLM.2 <- glm(new.internet ~ gender + new.age + new.cell + new.inc, family=binomial(logit),
weights=weight, data=brazil)
## Warning: non-integer #successes in a binomial glm!
stargazer(GLM.1, GLM.2, type="html", align=TRUE, no.space=TRUE,
dep.var.labels=c("Internet use (unweighted data), Internet use (weighted data)"),
covariate.labels=c("Female", "seniors","young adults", "youth", "own cell phone",
"low income hhld", "medium income hhld"))
Dependent variable: | ||
Internet use (unweighted data), Internet use (weighted data) | ||
(1) | (2) | |
Female | -0.247 | -0.175 |
(0.174) | (0.164) | |
seniors | -1.479*** | -1.341*** |
(0.253) | (0.263) | |
young adults | 0.599*** | 0.553*** |
(0.221) | (0.207) | |
youth | 1.649*** | 1.540*** |
(0.254) | (0.224) | |
own cell phone | 1.652*** | 1.603*** |
(0.241) | (0.229) | |
low income hhld | -2.938*** | -2.923*** |
(0.319) | (0.310) | |
medium income hhld | -1.923*** | -1.915*** |
(0.274) | (0.270) | |
Constant | 0.094 | 0.159 |
(0.351) | (0.337) | |
Observations | 928 | 928 |
Log Likelihood | -411.062 | -439.018 |
Akaike Inf. Crit. | 838.125 | 894.035 |
Note: | p<0.1; p<0.05; p<0.01 |