Chapter 4: Serving Tables

---
title: "Chapter 4 Brazil table"
author: Murtaza Haider
date: June 20, 2015
---

Using PEW 2010 data set from Global Attitudes

Subsetting to Brazil

Downloading data from the website. The download command downloads data in the working directory.

Data are downloaded, loaded, and names of fields are read from the file:

setInternet2(TRUE) # solution for https files
download.file("https://sites.google.com/site/econometriks/docs/brazil.RData", "brazil2.RData")
load("brazil2.Rdata")
#load("F:/Research/All_else_being_equal/Chapters/Ch_4_serving tables/R files/Brazil/brazil.RData")
names(brazil)
##  [1] "country"      "satisfied"    "internet"     "email"       
##  [5] "cell"         "social"       "gender"       "age"         
##  [9] "education"    "income"       "ethnicity"    "pol.party"   
## [13] "weight"       "new.cell"     "new.email"    "new.internet"
## [17] "new.social"   "new.satis"    "new.inc"      "new.age"

There are 20 variables in the file.

Story Line

Never seen more serene vistas than Rio. A city in continuous celebration. Growing market, identified by Goldman Sachs in BRICS.

Focus: Internet and social media.

Data: Brazil

Question: What has age, sex, education, income, ethnicity, and political affiliation to do with the use of Internet and social media?

Illustration of tables Data set has been cleaned for the following: * Subsetted * Column names changed to more common sense explanations * new derived variables.

Start with summary statistics. Display the actual output and then show it should be printed. Using Excel and HTML and the Word files.

Descriptive statistics

Starting with the first variable: country.

R specific quirk with factors

Fixing the country variable with refactoring.

library(xtable)
library(htmlTable)

Uncorrected results

tab<-xtable(cbind("country"=table(brazil$country)))
print(tab, type="html")
country
argentina 0
brazil 1000
britain 0
china 0
egypt 0
france 0
germany 0
india 0
indonesia 0
japan 0
jordan 0
kenya 0
lebanon 0
mexico 0
nigeria 0
pakistan 0
poland 0
russia 0
south korea 0
spain 0
turkey 0
us 0

Corrected results

brazil$country<-factor(brazil$country)
tab<-xtable(cbind("country"=table(brazil$country)))
print(tab, type="html")
country
brazil 1000

What is the status of Internet and social media

Basic descriptives

table(brazil$internet)
## 
##        yes         no don’t know    refused 
##        358        638          4          0

With xtables

Table 4.2

tab<-cbind("internet"=table(brazil$internet))
htmlTable(tab, rowlabel = "User status")
User status internet
yes 358
no 638
don’t know 4
refused 0
tab<-xtable(cbind("internet"=table(brazil$internet)))
print(tab, type="html")
internet
yes 358
no 638
don’t know 4
refused 0

Adding more columns

tab<-xtable(cbind("internet"=table(brazil$internet),
                  "email"=table(brazil$email),
                  "social media"=table(brazil$social),
                  "cell"=table(brazil$cell)
                  ))
print(tab, type="html")
internet email social media cell
yes 358 296 265 675
no 638 702 97 324
don’t know 4 2 0 1
refused 0 0 0 0
library(htmlTable)
tab<-xtable(cbind("internet"=table(brazil$internet),
                  "email"=table(brazil$email),
                  "social media"=table(brazil$social),
                  "cell"=table(brazil$cell)
                  ))
htmlTable(tab, 
          cgroup = c("internet & digital media users"), 
          n.cgroup = c(4))
internet & digital media users
internet email social media cell
yes 358 296 265 675
no 638 702 97 324
don’t know 4 2 0 1
refused 0 0 0 0

Using sapply and function

Table 4.3

tab<-sapply(brazil[,3:6],function(x) table(x))
htmlTable(tab, cgroup = c("internet & digital media users"), 
          n.cgroup = c(4), rowlabel = "User status")
internet & digital media users
User status internet email cell social
yes 358 296 675 265
no 638 702 324 97
don’t know 4 2 1 0
refused 0 0 0 0

Proportions ..

Table 4.4

tab<-(cbind("internet"=table(brazil$internet),
                  "email"=table(brazil$email),
                  "social media"=table(brazil$social),
                  "cell"=table(brazil$cell)
                  ))

tab1<- addmargins((prop.table(tab,2)*100),1)
htmlTable(txtRound(tab1,1), cgroup = c("internet & digital media users"), 
          n.cgroup = c(4), rowlabel = "User status")
internet & digital media users
User status internet email social media cell
yes 35.8 29.6 73.2 67.5
no 63.8 70.2 26.8 32.4
don’t know 0.4 0.2 0.0 0.1
refused 0.0 0.0 0.0 0.0
Sum 100.0 100.0 100.0 100.0

Proportions with sapply, and rounding the output

Table 4.5

tab<-sapply(brazil[,3:6],function(x) table(x))
tab1<- addmargins((prop.table(tab,2)*100),1)
htmlTable(txtRound(tab1,0), cgroup = c("internet & digital media users"), 
          n.cgroup = c(4), rowlabel = "User status")
internet & digital media users
User status internet email cell social
yes 36 30 68 73
no 64 70 32 27
don’t know 0 0 0 0
refused 0 0 0 0
Sum 100 100 100 100

know your data .. The question for social media is asked of those with internet

tab1<-xtable(table(brazil$social, brazil$internet))
print(tab1, type="html")
yes no don’t know refused
yes 263 2 0 0
no 95 2 0 0
don’t know 0 0 0 0
refused 0 0 0 0
tab2<-xtable(table(brazil$cell, brazil$internet))
print(tab2, type="html")
yes no don’t know refused
yes 325 350 0 0
no 33 288 3 0
don’t know 0 0 1 0
refused 0 0 0 0
tab1<-xtable(table(brazil$social, brazil$internet))
htmlTable(tab1, 
          cgroup = c("internet users"), n.cgroup = c(4),
          rgroup=c("social media users"), n.rgroup=c(4))
internet users
yes no don’t know refused
social media users
  yes 263 2 0 0
  no 95 2 0 0
  don’t know 0 0 0 0
  refused 0 0 0 0
tab1<-xtable(table(brazil$social, brazil$internet))
htmlTable(tab1, 
          cgroup = c("internet users"), n.cgroup = c(4),
         rowlabel = "social media users")
internet users
social media users yes no don’t know refused
yes 263 2 0 0
no 95 2 0 0
don’t know 0 0 0 0
refused 0 0 0 0

Cleaned data set without don’t knows and refused

Table 4.6

tab<-(cbind("internet"=table(brazil$new.internet),
                  "email"=table(brazil$new.email),
                  "social media"=table(brazil$new.social),
                  "cell"=table(brazil$new.cell)
                  ))
tab1<- addmargins((prop.table(tab,2)*100),1)
htmlTable(txtRound(tab1,1), cgroup = c("internet & digital media users"), 
          n.cgroup = c(4), rowlabel = "User status")
internet & digital media users
User status internet email social media cell
no 64.1 70.3 26.8 32.4
yes 35.9 29.7 73.2 67.6
Sum 100.0 100.0 100.0 100.0

Using the attach option

attach(brazil)
tab1<-(cbind("internet"=xtabs( ~ new.internet),
                  "email"=xtabs(~new.email),
                  "social media"=xtabs(~new.social),
                  "cell"=xtabs(~new.cell)
                  ))
tab2<-xtable(tab1)
print(tab2, type="html")
internet email social media cell
no 638 702 97 324
yes 358 296 265 675
tab3<- xtable(addmargins((prop.table(tab1,2)*100),1))
print(tab3, type="html")
internet email social media cell
no 64.06 70.34 26.80 32.43
yes 35.94 29.66 73.20 67.57
Sum 100.00 100.00 100.00 100.00

Weighted data set using xtabs

tab1<-(cbind("internet"=xtabs(weight ~ new.internet),
                  "email"=xtabs(weight~new.email),
                  "social media"=xtabs(weight~new.social),
                  "cell"=xtabs(weight~new.cell)
                  ))
tab2<-xtable(tab1)
print(tab2, type="html")
internet email social media cell
no 569.99 639.78 104.75 271.49
yes 427.57 359.38 327.90 728.13
tab3<- xtable(addmargins((prop.table(tab1,2)*100),1))
print(tab3, type="html")
internet email social media cell
no 57.14 64.03 24.21 27.16
yes 42.86 35.97 75.79 72.84
Sum 100.00 100.00 100.00 100.00

See the new command with digits=0 in the xtable command

Table 4.7

tab1<-(cbind("internet"=xtabs(weight ~ new.internet),
                  "email"=xtabs(weight~new.email),
                  "social media"=xtabs(weight~new.social),
                  "cell"=xtabs(weight~new.cell)
                  ))
htmlTable(txtRound(tab1,0), cgroup = c("internet & digital media users"), 
          n.cgroup = c(4), rowlabel = "User status")
internet & digital media users
User status internet email social media cell
no 570 640 105 271
yes 428 359 328 728
# tab3<- xtable(addmargins((prop.table(tab1,2)*100),1), digits=0)
# print(tab3, type="html")

With SAPPLY and weighted data set

Table 4.8

tab1<-sapply(brazil[,14:17],function(x,y) xtabs(y ~ x), y=brazil[,13])
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
new.cell new.email new.internet new.social
no 271 640 570 105
yes 728 359 428 328
tab3<- txtRound(addmargins((prop.table(tab1,2)*100),1),0)
htmlTable(tab3, rowlabel = "User status")
User status new.cell new.email new.internet new.social
no 27 64 57 24
yes 73 36 43 76
Sum 100 100 100 100

Cross Tabulations

Using the recoded variables for income, gender, education and others.

First the tabulating explanatory variables:

Gender

tab1<-cbind("Gender"=xtabs(weight ~ gender))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
Gender
male 503
female 497
tab1<-cbind("Gender"=xtabs(weight ~ gender+new.internet))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
no yes
male 274 229
female 296 199
tab3<-xtable(addmargins((prop.table(tab1,1)*100),2),digits=1)
print(tab3, type="html")
no yes Sum
male 54.5 45.5 100.0
female 59.9 40.1 100.0

Table 4.9 and 4.10

tab1<-cbind(xtabs(weight ~ gender+new.internet))
tab2<-txtRound(tab1, 0)
htmlTable(tab2, 
          cgroup = c("internet users"), n.cgroup = c(2),
          rgroup=c("Gender"), n.rgroup=c(2))
internet users
no yes
Gender
  male 274 229
  female 296 199
tab3<-addmargins((prop.table(tab1,1)*100),2)
tab4<-txtRound(tab3,1)
htmlTable(tab4, 
          cgroup = c("internet users"), n.cgroup = c(3),
          rowlabel = "Gender")
internet users
Gender no yes Sum
male 54.5 45.5 100.0
female 59.9 40.1 100.0

Education

tab1<-cbind("Education"=xtabs(weight ~ education))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
Education
no formal education 27
incomplete primary school 96
complete primary school 176
incomplete secondary school 112
complete secondary school 127
incomplete tertiary school 66
complete tertiary school 269
incomplete university 52
complete university 77
don’t know 0
refused 0
tab1<-cbind("Education"=xtabs(weight ~ education+new.internet))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
no yes
no formal education 27 0
incomplete primary school 93 3
complete primary school 160 16
incomplete secondary school 82 29
complete secondary school 70 56
incomplete tertiary school 27 39
complete tertiary school 98 171
incomplete university 7 45
complete university 7 70
don’t know 0 0
refused 0 0

Table 4.11

tab1<-cbind("Education"=xtabs(weight ~ education+new.internet))
tab2<-txtRound(addmargins((prop.table(tab1,1)*100),2),1)
htmlTable(tab2, cgroup = c("internet users"), n.cgroup = c(3),
          rowlabel = "Education attainment")
internet users
Education attainment no yes Sum
no formal education 100.0 0.0 100.0
incomplete primary school 96.8 3.2 100.0
complete primary school 91.0 9.0 100.0
incomplete secondary school 74.0 26.0 100.0
complete secondary school 55.8 44.2 100.0
incomplete tertiary school 40.6 59.4 100.0
complete tertiary school 36.4 63.6 100.0
incomplete university 13.0 87.0 100.0
complete university 9.3 90.7 100.0
don’t know
refused

Income

tab1<-cbind("income"=xtabs(weight ~ income))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
income
up to r$ 200,00 28
from r$ 201,00 to 400,00 60
from 401,00 to 600,00 167
from 601,00 to 1000,00 225
from 1001,00 to 1400,00 176
from 1401,00 to 2000,00 139
from 2001,00 to 3000,00 75
from 3001,00 to 4000,00 30
from 4001,00 to 6000,00 20
from 6001,00 to 10.000,00 10
from 10.001,00 to 15.000,00 1
over 15.000,00 0
don’t know 22
refused 48
tab1<-cbind("income"=xtabs(weight ~ income+new.internet))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
no yes
up to r$ 200,00 23 5
from r$ 201,00 to 400,00 43 16
from 401,00 to 600,00 135 32
from 601,00 to 1000,00 148 76
from 1001,00 to 1400,00 99 76
from 1401,00 to 2000,00 59 80
from 2001,00 to 3000,00 20 56
from 3001,00 to 4000,00 9 21
from 4001,00 to 6000,00 4 16
from 6001,00 to 10.000,00 1 9
from 10.001,00 to 15.000,00 0 1
over 15.000,00 0 0
don’t know 12 10
refused 18 30
tab3<-xtable(addmargins((prop.table(tab1,1)*100),2),digits=1)
print(tab3, type="html")
no yes Sum
up to r$ 200,00 83.8 16.2 100.0
from r$ 201,00 to 400,00 73.0 27.0 100.0
from 401,00 to 600,00 81.0 19.0 100.0
from 601,00 to 1000,00 66.0 34.0 100.0
from 1001,00 to 1400,00 56.5 43.5 100.0
from 1401,00 to 2000,00 42.4 57.6 100.0
from 2001,00 to 3000,00 26.0 74.0 100.0
from 3001,00 to 4000,00 30.4 69.6 100.0
from 4001,00 to 6000,00 18.6 81.4 100.0
from 6001,00 to 10.000,00 6.7 93.3 100.0
from 10.001,00 to 15.000,00 0.0 100.0 100.0
over 15.000,00 0.0 100.0 100.0
don’t know 53.6 46.4 100.0
refused 37.4 62.6 100.0

Table 4.12

tab1<-cbind("Education"=xtabs(weight ~ income+new.internet))
tab2<-txtRound(addmargins((prop.table(tab1,1)*100),2),1)
htmlTable(tab2, cgroup = c("internet users"), n.cgroup = c(3),
          rowlabel = "Income level")
internet users
Income level no yes Sum
up to r$ 200,00 83.8 16.2 100.0
from r$ 201,00 to 400,00 73.0 27.0 100.0
from 401,00 to 600,00 81.0 19.0 100.0
from 601,00 to 1000,00 66.0 34.0 100.0
from 1001,00 to 1400,00 56.5 43.5 100.0
from 1401,00 to 2000,00 42.4 57.6 100.0
from 2001,00 to 3000,00 26.0 74.0 100.0
from 3001,00 to 4000,00 30.4 69.6 100.0
from 4001,00 to 6000,00 18.6 81.4 100.0
from 6001,00 to 10.000,00 6.7 93.3 100.0
from 10.001,00 to 15.000,00 0.0 100.0 100.0
over 15.000,00 0.0 100.0 100.0
don’t know 53.6 46.4 100.0
refused 37.4 62.6 100.0

Age

tab1<-cbind("age"=xtabs(weight ~ new.age))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
age
middle aged 311
seniors 252
young adults 231
youth 207
tab1<-cbind("age"=xtabs(weight ~ new.age+new.internet))
tab2<-xtable(tab1, digits=0)
print(tab2, type="html")
no yes
middle aged 188 122
seniors 212 38
young adults 110 121
youth 60 147
tab3<-xtable(addmargins((prop.table(tab1,1)*100),2),digits=1)
print(tab3, type="html")
no yes Sum
middle aged 60.7 39.3 100.0
seniors 85.0 15.0 100.0
young adults 47.5 52.5 100.0
youth 28.8 71.2 100.0

Table 4.13

tab1<-cbind("Age cohort"=xtabs(weight ~ new.age+new.internet))
tab2<-txtRound(addmargins((prop.table(tab1,1)*100),2),1)
htmlTable(tab2, cgroup = c("internet users"), n.cgroup = c(3),
          rowlabel = "Age cohorts")
internet users
Age cohorts no yes Sum
middle aged 60.7 39.3 100.0
seniors 85.0 15.0 100.0
young adults 47.5 52.5 100.0
youth 28.8 71.2 100.0

Age + Gender

round(ftable(xtabs(weight~gender+new.age+new.internet)),0)
##                     new.internet  no yes
## gender new.age                          
## male   middle aged                85  68
##        seniors                   106  18
##        young adults               52  65
##        youth                      31  78
## female middle aged               103  53
##        seniors                   107  20
##        young adults               58  56
##        youth                      29  69

Merging it all in a Binary Logit model

GLM.1 –> unweighted model GLM.2 –> weighted model

GLM.1 <- glm(new.internet ~ gender + new.age + new.cell + new.inc, family=binomial(logit), data=brazil)
GLM.2 <- glm(new.internet ~ gender + new.age + new.cell + new.inc, family=binomial(logit), 
    weights=weight, data=brazil)
## Warning: non-integer #successes in a binomial glm!
logit1<-xtable(summary(GLM.1))
print(logit1, type="html")
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.0940 0.3513 0.27 0.7892
genderfemale -0.2467 0.1738 -1.42 0.1559
new.ageseniors -1.4793 0.2529 -5.85 0.0000
new.ageyoung adults 0.5988 0.2209 2.71 0.0067
new.ageyouth 1.6487 0.2541 6.49 0.0000
new.cellyes 1.6515 0.2411 6.85 0.0000
new.inclow -2.9382 0.3186 -9.22 0.0000
new.incmedium -1.9228 0.2736 -7.03 0.0000
logit2<-xtable(summary(GLM.2))
print(logit2, type="html")
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.1594 0.3373 0.47 0.6364
genderfemale -0.1748 0.1645 -1.06 0.2878
new.ageseniors -1.3409 0.2627 -5.11 0.0000
new.ageyoung adults 0.5527 0.2067 2.67 0.0075
new.ageyouth 1.5403 0.2244 6.86 0.0000
new.cellyes 1.6031 0.2288 7.01 0.0000
new.inclow -2.9232 0.3101 -9.43 0.0000
new.incmedium -1.9146 0.2696 -7.10 0.0000
expon<-cbind("unweighted model"=exp(coef(GLM.1)), "weighted model"=exp(coef(GLM.2)))
print(xtable(expon,digits=3), type="html")
unweighted model weighted model
(Intercept) 1.099 1.173
genderfemale 0.781 0.840
new.ageseniors 0.228 0.262
new.ageyoung adults 1.820 1.738
new.ageyouth 5.200 4.666
new.cellyes 5.215 4.969
new.inclow 0.053 0.054
new.incmedium 0.146 0.147

Stargazer with Brazil data

Merging it all in a Binary Logit model

GLM.1 –> unweighted model GLM.2 –> weighted model

GLM.1 <- glm(new.internet ~ gender + new.age + new.cell + new.inc, family=binomial(logit), data=brazil)
GLM.2 <- glm(new.internet ~ gender + new.age + new.cell + new.inc, family=binomial(logit), 
    weights=weight, data=brazil)
## Warning: non-integer #successes in a binomial glm!
stargazer(GLM.1, GLM.2, type="html", align=TRUE, no.space=TRUE, 
          dep.var.labels=c("Internet use (unweighted data), Internet use (weighted data)"),
          covariate.labels=c("Female", "seniors","young adults", "youth", "own cell phone", 
                             "low income hhld", "medium income hhld"))
Dependent variable:
Internet use (unweighted data), Internet use (weighted data)
(1) (2)
Female -0.247 -0.175
(0.174) (0.164)
seniors -1.479*** -1.341***
(0.253) (0.263)
young adults 0.599*** 0.553***
(0.221) (0.207)
youth 1.649*** 1.540***
(0.254) (0.224)
own cell phone 1.652*** 1.603***
(0.241) (0.229)
low income hhld -2.938*** -2.923***
(0.319) (0.310)
medium income hhld -1.923*** -1.915***
(0.274) (0.270)
Constant 0.094 0.159
(0.351) (0.337)
Observations 928 928
Log Likelihood -411.062 -439.018
Akaike Inf. Crit. 838.125 894.035
Note: p<0.1; p<0.05; p<0.01