Welcome to a new issue of e-Tutorial. This e-TA will focus on Censored Regression Models, with special emphasis on helping answer question 4 of PS5. 1
You can download the data set, called weco14.txt from the Econ 536 web site. Save it in your preferred directory.
Then you can load it in R. If you saved it in your hard drive in a folder called econ536 you can load it by typing:
weco<-read.table("C:/econ536/weco14.txt", header=T, sep="")
or you can retrieve it from the web
weco<-read.table("http://www.econ.uiuc.edu/~econ536/Data/weco14.txt", header=T, sep="")
head(weco)
y sex dex lex kwit job_tenure status treatment ypost
1 13.73 0 38 10 FALSE 277 TRUE TRUE 14.35
2 17.15 1 55 11 TRUE 173 TRUE NA NA
3 13.63 1 45 12 FALSE 410 TRUE TRUE 15.75
4 13.04 1 41 11 FALSE 247 TRUE FALSE 18.33
5 13.20 1 42 10 FALSE 340 TRUE FALSE 13.96
6 16.43 0 38 12 FALSE 517 TRUE TRUE 18.54
To estimate the equation of productivity, using only non-quitters. To do so we need to use the Heckman two-step procedure following Lecture 21. But first we need to crate a dummy variable that identifies non quitters, and run a probit regression:
weco$lex2<-weco$lex^2
weco$kwit<-as.numeric(weco$kwit)
weco$nonkwit<-ifelse(weco$kwit==1,0,1)
head(weco)
y sex dex lex kwit job_tenure status treatment ypost lex2 nonkwit
1 13.73 0 38 10 0 277 TRUE TRUE 14.35 100 1
2 17.15 1 55 11 1 173 TRUE NA NA 121 0
3 13.63 1 45 12 0 410 TRUE TRUE 15.75 144 1
4 13.04 1 41 11 0 247 TRUE FALSE 18.33 121 1
5 13.20 1 42 10 0 340 TRUE FALSE 13.96 100 1
6 16.43 0 38 12 0 517 TRUE TRUE 18.54 144 1
After we have all the variables we follow the “recipe” in Lecture 21
probit<-glm(nonkwit~sex+dex+lex+lex2, data=weco, family=binomial(link="probit"))
summary(probit)
Call:
glm(formula = nonkwit ~ sex + dex + lex + lex2, family = binomial(link = "probit"),
data = weco)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4622 0.3041 0.5762 0.7578 1.3164
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.698385 2.431400 -3.578 0.000347 ***
sex -0.271552 0.113076 -2.401 0.016328 *
dex 0.058008 0.008255 7.027 2.12e-12 ***
lex 1.155336 0.382383 3.021 0.002516 **
lex2 -0.047018 0.015253 -3.083 0.002052 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 745.97 on 682 degrees of freedom
Residual deviance: 679.38 on 678 degrees of freedom
AIC: 689.38
Number of Fisher Scoring iterations: 4
weco$lambda <- dnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))/pnorm(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2)%*%(probit$coef))
weco_nk<-weco[which(weco$nonkwit==1),]
lm<-lm(y~sex+dex+lex+lex2+lambda, data=weco_nk)
summary(lm)
Call:
lm(formula = y ~ sex + dex + lex + lex2 + lambda, data = weco_nk)
Residuals:
Min 1Q Median 3Q Max
-3.7671 -0.6984 -0.0005 0.7654 3.2607
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.914275 8.059634 1.602 0.10969
sex -0.691723 0.215062 -3.216 0.00138 **
dex 0.074241 0.039289 1.890 0.05937 .
lex -0.092993 0.982192 -0.095 0.92461
lex2 0.003804 0.040017 0.095 0.92431
lambda -1.622260 1.652701 -0.982 0.32677
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.154 on 516 degrees of freedom
Multiple R-squared: 0.3512, Adjusted R-squared: 0.3449
F-statistic: 55.86 on 5 and 516 DF, p-value: < 2.2e-16
Then you can test for sample selectivity problems by checking the significance of \(\hat{\lambda_i}\), as remarked in Lecture 21. Please indicate what model you should use after all, based on the sample selectivity test.
As pointed out by Lecture 21 a problem with the Gaussian MLE is that it can perform poorly in non-Gaussian and/or heteroscedastic circumstances. In that case we could use Powell estimator which can be achieved in R by using the crq.fit.pow
in the quantreg
package:
require(quantreg)
Loading required package: quantreg
Loading required package: SparseM
Attaching package: 'SparseM'
The following object is masked from 'package:base':
backsolve
powell<- crq.fit.pow(cbind(1,weco$sex, weco$dex,weco$lex,weco$lex2),weco$y,weco$nonkwit,left=TRUE)
Warning in rq.fit.br(x, y, tau): Solution may be nonunique
powell$coef
[1] 5.06256518 -0.89429392 0.11187927 0.91000879 -0.03987689
Note that under certain conditions this works for any \(F\) even if there is heteroskedasticity.
Please send comments to hrtdmrt2@illinois.edu or srmntbr2@illinois.edu↩