Applied econometrics: Econ 508

## TA: Nicolas Bottan

Welcome to a new issue of e-Tutorial. This e-TA will focus on Censored Regression Models, with special emphasis on helping answer question 4 of PS5. 1

# Data

You can download the data set, called weco14.txt from the Econ 508 web site. Save it in your preferred directory.

Then you can load it in R. If you saved it in your hard drive in a folder called econ508 you can load it by typing:

weco<-read.table("C:/econ508/weco14.txt", header=T, sep="")

or you can retrieve it from the web

weco<-read.table("http://www.econ.uiuc.edu/~econ508/data/weco14.txt", header=T, sep="")
head(weco)
         y sex dex lex  kwit job_tenure status
1 13.72579   0  38  10 FALSE        277   TRUE
2 17.15373   1  55  11  TRUE        173   TRUE
3 13.62992   1  45  12 FALSE        410   TRUE
4 13.03998   1  41  11 FALSE        247   TRUE
5 13.19510   1  42  10 FALSE        340   TRUE
6 16.43010   0  38  12 FALSE        517   TRUE

# Heckman two-step procedure

To estimate the equation of productivity, using only non-quitters. To do so we need to use the Heckman two-step procedure following Lecture 21. But first we need to crate a dummy variable that identifies non quitters, and run a probit regression:

weco$lex2<-weco$lex^2
weco$kwit<-as.numeric(weco$kwit)
weco$nonkwit<-ifelse(weco$kwit==1,0,1)
head(weco)
         y sex dex lex kwit job_tenure status lex2 nonkwit
1 13.72579   0  38  10    0        277   TRUE  100       1
2 17.15373   1  55  11    1        173   TRUE  121       0
3 13.62992   1  45  12    0        410   TRUE  144       1
4 13.03998   1  41  11    0        247   TRUE  121       1
5 13.19510   1  42  10    0        340   TRUE  100       1
6 16.43010   0  38  12    0        517   TRUE  144       1

After we have all the variables we follow the “recipe” in Lecture 21

1. Estimate binary choice model by probit
probit<-glm(nonkwit~sex+dex+lex+lex2, data=weco, family=binomial(link="probit"))
summary(probit)

Call:
glm(formula = nonkwit ~ sex + dex + lex + lex2, family = binomial(link = "probit"),
data = weco)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-2.4622   0.3041   0.5762   0.7578   1.3164

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.698385   2.431400  -3.578 0.000347 ***
sex         -0.271552   0.113076  -2.401 0.016328 *
dex          0.058008   0.008255   7.027 2.12e-12 ***
lex          1.155336   0.382383   3.021 0.002516 **
lex2        -0.047018   0.015253  -3.083 0.002052 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 745.97  on 682  degrees of freedom
Residual deviance: 679.38  on 678  degrees of freedom
AIC: 689.38

Number of Fisher Scoring iterations: 4
1. Construct $$\hat{\lambda_i} = \frac{\phi(x'_i\gamma)}{\Phi(x'_i\gamma)}$$
weco$lambda <- dnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))/pnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef)) 1. Re estimate original model using only $$y_i > 0$$ observations but including $$\hat{\lambda_i}$$ as additional explanatory variable weco_nk<-weco[which(weco$nonkwit==1),]
lm<-lm(y~sex+dex+lex+lex2+lambda, data=weco_nk)
summary(lm)

Call:
lm(formula = y ~ sex + dex + lex + lex2 + lambda, data = weco_nk)

Residuals:
Min      1Q  Median      3Q     Max
-3.7638 -0.7006 -0.0036  0.7645  3.2609

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.912068   8.058899   1.602  0.10972
sex         -0.691889   0.215042  -3.217  0.00137 **
dex          0.074247   0.039285   1.890  0.05933 .
lex         -0.092592   0.982103  -0.094  0.92492
lex2         0.003787   0.040014   0.095  0.92464
lambda      -1.623275   1.652550  -0.982  0.32642
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.154 on 516 degrees of freedom
Multiple R-squared:  0.3514,    Adjusted R-squared:  0.3451
F-statistic:  55.9 on 5 and 516 DF,  p-value: < 2.2e-16

Then you can test for sample selectivity problems by checking the significance of $$\hat{\lambda_i}$$, as remarked in Lecture 21. Please indicate what model you should use after all, based on the sample selectivity test.

# Powell’s estimator

As pointed out by Lecture 21 a problem with the Gaussian MLE is that it can perform poorly in non-Gaussian and/or heteroscedastic circumstances. In that case we could use Powell estimator which can be achieved in R by using the crq.fit.pow in the quantreg package:

require(quantreg)
Loading required package: quantreg

Attaching package: 'SparseM'

The following object is masked from 'package:base':

backsolve
powell<- crq.fit.pow(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2),weco$y,weco$nonkwit,left=TRUE)
Warning in rq.fit.br(x, y, tau): Solution may be nonunique
powell\$coef
[1]  4.92366053 -0.89661622  0.11193125  0.92806584 -0.04043428

Note that under certain conditions this works for any $$F$$ even if there is heteroskedasticity.