Applied econometrics: Econ 508
logo

Applied Econometrics 
Econ 508 - Fall 2014

Professor: Roger Koenker 

TA: Nicolas Bottan 

Welcome to a new issue of e-Tutorial. This e-TA will focus on Censored Regression Models, with special emphasis on helping answer question 4 of PS5. 1

Data

You can download the data set, called weco14.txt from the Econ 508 web site. Save it in your preferred directory.

Then you can load it in R. If you saved it in your hard drive in a folder called econ508 you can load it by typing:

weco<-read.table("C:/econ508/weco14.txt", header=T, sep="")

or you can retrieve it from the web

weco<-read.table("http://www.econ.uiuc.edu/~econ508/data/weco14.txt", header=T, sep="")
head(weco)
         y sex dex lex  kwit job_tenure status
1 13.72579   0  38  10 FALSE        277   TRUE
2 17.15373   1  55  11  TRUE        173   TRUE
3 13.62992   1  45  12 FALSE        410   TRUE
4 13.03998   1  41  11 FALSE        247   TRUE
5 13.19510   1  42  10 FALSE        340   TRUE
6 16.43010   0  38  12 FALSE        517   TRUE

Heckman two-step procedure

To estimate the equation of productivity, using only non-quitters. To do so we need to use the Heckman two-step procedure following Lecture 21. But first we need to crate a dummy variable that identifies non quitters, and run a probit regression:

weco$lex2<-weco$lex^2
weco$kwit<-as.numeric(weco$kwit)
weco$nonkwit<-ifelse(weco$kwit==1,0,1)
head(weco)
         y sex dex lex kwit job_tenure status lex2 nonkwit
1 13.72579   0  38  10    0        277   TRUE  100       1
2 17.15373   1  55  11    1        173   TRUE  121       0
3 13.62992   1  45  12    0        410   TRUE  144       1
4 13.03998   1  41  11    0        247   TRUE  121       1
5 13.19510   1  42  10    0        340   TRUE  100       1
6 16.43010   0  38  12    0        517   TRUE  144       1

After we have all the variables we follow the “recipe” in Lecture 21

  1. Estimate binary choice model by probit
probit<-glm(nonkwit~sex+dex+lex+lex2, data=weco, family=binomial(link="probit"))
summary(probit)

Call:
glm(formula = nonkwit ~ sex + dex + lex + lex2, family = binomial(link = "probit"), 
    data = weco)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4622   0.3041   0.5762   0.7578   1.3164  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -8.698385   2.431400  -3.578 0.000347 ***
sex         -0.271552   0.113076  -2.401 0.016328 *  
dex          0.058008   0.008255   7.027 2.12e-12 ***
lex          1.155336   0.382383   3.021 0.002516 ** 
lex2        -0.047018   0.015253  -3.083 0.002052 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 745.97  on 682  degrees of freedom
Residual deviance: 679.38  on 678  degrees of freedom
AIC: 689.38

Number of Fisher Scoring iterations: 4
  1. Construct \(\hat{\lambda_i} = \frac{\phi(x'_i\gamma)}{\Phi(x'_i\gamma)}\)
weco$lambda <- dnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))/pnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))
  1. Re estimate original model using only \(y_i > 0\) observations but including \(\hat{\lambda_i}\) as additional explanatory variable
weco_nk<-weco[which(weco$nonkwit==1),]
lm<-lm(y~sex+dex+lex+lex2+lambda, data=weco_nk) 
summary(lm)

Call:
lm(formula = y ~ sex + dex + lex + lex2 + lambda, data = weco_nk)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7638 -0.7006 -0.0036  0.7645  3.2609 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 12.912068   8.058899   1.602  0.10972   
sex         -0.691889   0.215042  -3.217  0.00137 **
dex          0.074247   0.039285   1.890  0.05933 . 
lex         -0.092592   0.982103  -0.094  0.92492   
lex2         0.003787   0.040014   0.095  0.92464   
lambda      -1.623275   1.652550  -0.982  0.32642   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.154 on 516 degrees of freedom
Multiple R-squared:  0.3514,    Adjusted R-squared:  0.3451 
F-statistic:  55.9 on 5 and 516 DF,  p-value: < 2.2e-16

Then you can test for sample selectivity problems by checking the significance of \(\hat{\lambda_i}\), as remarked in Lecture 21. Please indicate what model you should use after all, based on the sample selectivity test.

Powell’s estimator

As pointed out by Lecture 21 a problem with the Gaussian MLE is that it can perform poorly in non-Gaussian and/or heteroscedastic circumstances. In that case we could use Powell estimator which can be achieved in R by using the crq.fit.pow in the quantreg package:

require(quantreg)
Loading required package: quantreg
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve
powell<- crq.fit.pow(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2),weco$y,weco$nonkwit,left=TRUE)
Warning in rq.fit.br(x, y, tau): Solution may be nonunique
powell$coef
[1]  4.92366053 -0.89661622  0.11193125  0.92806584 -0.04043428

Note that under certain conditions this works for any \(F\) even if there is heteroskedasticity.


  1. Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu