Welcome to a new issue of e-Tutorial. This e-TA will focus on Censored Regression Models, with special emphasis on helping answer question 4 of PS5. ¹

Data

You can download the data set, called weco14.txt from the Econ 508 web site. Save it in your preferred directory.

Then you can load it in R. If you saved it in your hard drive in a folder called econ508 you can load it by typing:

weco<-read.table("C:/econ508/weco14.txt", header=T, sep="")

or you can retrieve it from the web

weco<-read.table("http://www.econ.uiuc.edu/~econ508/data/weco14.txt", header=T, sep="")
head(weco)

         y sex dex lex  kwit job_tenure status
1 13.72579   0  38  10 FALSE        277   TRUE
2 17.15373   1  55  11  TRUE        173   TRUE
3 13.62992   1  45  12 FALSE        410   TRUE
4 13.03998   1  41  11 FALSE        247   TRUE
5 13.19510   1  42  10 FALSE        340   TRUE
6 16.43010   0  38  12 FALSE        517   TRUE

Heckman two-step procedure

To estimate the equation of productivity, using only non-quitters. To do so we need to use the Heckman two-step procedure following Lecture 21. But first we need to crate a dummy variable that identifies non quitters, and run a probit regression:

weco$lex2<-weco$lex^2
weco$kwit<-as.numeric(weco$kwit)
weco$nonkwit<-ifelse(weco$kwit==1,0,1)
head(weco)

         y sex dex lex kwit job_tenure status lex2 nonkwit
1 13.72579   0  38  10    0        277   TRUE  100       1
2 17.15373   1  55  11    1        173   TRUE  121       0
3 13.62992   1  45  12    0        410   TRUE  144       1
4 13.03998   1  41  11    0        247   TRUE  121       1
5 13.19510   1  42  10    0        340   TRUE  100       1
6 16.43010   0  38  12    0        517   TRUE  144       1

After we have all the variables we follow the “recipe” in Lecture 21

Estimate binary choice model by probit

probit<-glm(nonkwit~sex+dex+lex+lex2, data=weco, family=binomial(link="probit"))
summary(probit)


Call:
glm(formula = nonkwit ~ sex + dex + lex + lex2, family = binomial(link = "probit"), 
    data = weco)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4622   0.3041   0.5762   0.7578   1.3164  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -8.698385   2.431400  -3.578 0.000347 ***
sex         -0.271552   0.113076  -2.401 0.016328 *  
dex          0.058008   0.008255   7.027 2.12e-12 ***
lex          1.155336   0.382383   3.021 0.002516 ** 
lex2        -0.047018   0.015253  -3.083 0.002052 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 745.97  on 682  degrees of freedom
Residual deviance: 679.38  on 678  degrees of freedom
AIC: 689.38

Number of Fisher Scoring iterations: 4

Construct \(\hat{\lambda_i} = \frac{\phi(x'_i\gamma)}{\Phi(x'_i\gamma)}\)

weco$lambda <- dnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))/pnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))

Re estimate original model using only \(y_i > 0\) observations but including \(\hat{\lambda_i}\) as additional explanatory variable

weco_nk<-weco[which(weco$nonkwit==1),]
lm<-lm(y~sex+dex+lex+lex2+lambda, data=weco_nk) 
summary(lm)


Call:
lm(formula = y ~ sex + dex + lex + lex2 + lambda, data = weco_nk)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7638 -0.7006 -0.0036  0.7645  3.2609 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 12.912068   8.058899   1.602  0.10972   
sex         -0.691889   0.215042  -3.217  0.00137 **
dex          0.074247   0.039285   1.890  0.05933 . 
lex         -0.092592   0.982103  -0.094  0.92492   
lex2         0.003787   0.040014   0.095  0.92464   
lambda      -1.623275   1.652550  -0.982  0.32642   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.154 on 516 degrees of freedom
Multiple R-squared:  0.3514,    Adjusted R-squared:  0.3451 
F-statistic:  55.9 on 5 and 516 DF,  p-value: < 2.2e-16

Then you can test for sample selectivity problems by checking the significance of \(\hat{\lambda_i}\), as remarked in Lecture 21. Please indicate what model you should use after all, based on the sample selectivity test.

Powell’s estimator

As pointed out by Lecture 21 a problem with the Gaussian MLE is that it can perform poorly in non-Gaussian and/or heteroscedastic circumstances. In that case we could use Powell estimator which can be achieved in R by using the crq.fit.pow in the quantreg package:

require(quantreg)

Loading required package: quantreg
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

powell<- crq.fit.pow(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2),weco$y,weco$nonkwit,left=TRUE)

Warning in rq.fit.br(x, y, tau): Solution may be nonunique

powell$coef

[1]  4.92366053 -0.89661622  0.11193125  0.92806584 -0.04043428

Note that under certain conditions this works for any \(F\) even if there is heteroskedasticity.

Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu ↩

Contact	Office Hours	E-mail
Prof. Roger Koenker	M. & W. 2:30-3:30 or by appointment (126 DKH)	rkoenker@illinois.edu
TA Nicolas Bottan	TBA	bottan2@illinois.edu

Applied Econometrics
Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

e-TA 15: Censored Regression Models

Data

Heckman two-step procedure

Powell’s estimator

Contact

Office Hours

E-mail

Applied Econometrics Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

e-TA 15: Censored Regression Models

Data

Heckman two-step procedure

Powell’s estimator

Contact

Office Hours

E-mail

Applied Econometrics
Econ 508 - Fall 2014