Welcome to a new issue of e-Tutorial. This e-TA will focus on Censored Regression Models, with special emphasis on helping answer question 4 of PS5. 1

Data

You can download the data set, called weco14.txt from the Econ 536 web site. Save it in your preferred directory.

Then you can load it in R. If you saved it in your hard drive in a folder called econ536 you can load it by typing:

weco<-read.table("C:/econ536/weco14.txt", header=T, sep="")

or you can retrieve it from the web

weco<-read.table("http://www.econ.uiuc.edu/~econ536/Data/weco14.txt", header=T, sep="")
head(weco)
      y sex dex lex  kwit job_tenure status treatment ypost
1 13.73   0  38  10 FALSE        277   TRUE      TRUE 14.35
2 17.15   1  55  11  TRUE        173   TRUE        NA    NA
3 13.63   1  45  12 FALSE        410   TRUE      TRUE 15.75
4 13.04   1  41  11 FALSE        247   TRUE     FALSE 18.33
5 13.20   1  42  10 FALSE        340   TRUE     FALSE 13.96
6 16.43   0  38  12 FALSE        517   TRUE      TRUE 18.54

Heckman two-step procedure

To estimate the equation of productivity, using only non-quitters. To do so we need to use the Heckman two-step procedure following Lecture 21. But first we need to crate a dummy variable that identifies non quitters, and run a probit regression:

weco$lex2<-weco$lex^2
weco$kwit<-as.numeric(weco$kwit)
weco$nonkwit<-ifelse(weco$kwit==1,0,1)
head(weco)
      y sex dex lex kwit job_tenure status treatment ypost lex2 nonkwit
1 13.73   0  38  10    0        277   TRUE      TRUE 14.35  100       1
2 17.15   1  55  11    1        173   TRUE        NA    NA  121       0
3 13.63   1  45  12    0        410   TRUE      TRUE 15.75  144       1
4 13.04   1  41  11    0        247   TRUE     FALSE 18.33  121       1
5 13.20   1  42  10    0        340   TRUE     FALSE 13.96  100       1
6 16.43   0  38  12    0        517   TRUE      TRUE 18.54  144       1

After we have all the variables we follow the “recipe” in Lecture 21

  1. Estimate binary choice model by probit
probit<-glm(nonkwit~sex+dex+lex+lex2, data=weco, family=binomial(link="probit"))
summary(probit)

Call:
glm(formula = nonkwit ~ sex + dex + lex + lex2, family = binomial(link = "probit"), 
    data = weco)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4622   0.3041   0.5762   0.7578   1.3164  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -8.698385   2.431400  -3.578 0.000347 ***
sex         -0.271552   0.113076  -2.401 0.016328 *  
dex          0.058008   0.008255   7.027 2.12e-12 ***
lex          1.155336   0.382383   3.021 0.002516 ** 
lex2        -0.047018   0.015253  -3.083 0.002052 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 745.97  on 682  degrees of freedom
Residual deviance: 679.38  on 678  degrees of freedom
AIC: 689.38

Number of Fisher Scoring iterations: 4
  1. Construct \(\hat{\lambda_i} = \frac{\phi(x'_i\gamma)}{\Phi(x'_i\gamma)}\)
weco$lambda <- dnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))/pnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))
  1. Re estimate original model using only \(y_i > 0\) observations but including \(\hat{\lambda_i}\) as additional explanatory variable
weco_nk<-weco[which(weco$nonkwit==1),]
lm<-lm(y~sex+dex+lex+lex2+lambda, data=weco_nk) 
summary(lm)

Call:
lm(formula = y ~ sex + dex + lex + lex2 + lambda, data = weco_nk)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7671 -0.6984 -0.0005  0.7654  3.2607 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 12.914275   8.059634   1.602  0.10969   
sex         -0.691723   0.215062  -3.216  0.00138 **
dex          0.074241   0.039289   1.890  0.05937 . 
lex         -0.092993   0.982192  -0.095  0.92461   
lex2         0.003804   0.040017   0.095  0.92431   
lambda      -1.622260   1.652701  -0.982  0.32677   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.154 on 516 degrees of freedom
Multiple R-squared:  0.3512,    Adjusted R-squared:  0.3449 
F-statistic: 55.86 on 5 and 516 DF,  p-value: < 2.2e-16

Then you can test for sample selectivity problems by checking the significance of \(\hat{\lambda_i}\), as remarked in Lecture 21. Please indicate what model you should use after all, based on the sample selectivity test.

Powell’s estimator

As pointed out by Lecture 21 a problem with the Gaussian MLE is that it can perform poorly in non-Gaussian and/or heteroscedastic circumstances. In that case we could use Powell estimator which can be achieved in R by using the crq.fit.pow in the quantreg package:

require(quantreg)
Loading required package: quantreg
Loading required package: SparseM

Attaching package: 'SparseM'
The following object is masked from 'package:base':

    backsolve
powell<- crq.fit.pow(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2),weco$y,weco$nonkwit,left=TRUE)
Warning in rq.fit.br(x, y, tau): Solution may be nonunique
powell$coef
[1]  5.06256518 -0.89429392  0.11187927  0.91000879 -0.03987689

Note that under certain conditions this works for any \(F\) even if there is heteroskedasticity.


  1. Please send comments to hrtdmrt2@illinois.edu or srmntbr2@illinois.edu