logo

Applied Econometrics 
Econ 508 - Fall 2014

Professor: Roger Koenker 

TA: Nicolas Bottan 

Welcome to a new issue of e-Tutorial. This e-TA will focus on Censored Regression Models, with special emphasis on helping answer question 4 of PS5. 1

Data

You can download the data set, called weco14.csv from the Econ 508 web site. Save it in your preferred directory.

See the first section of e-TA 13 on Cubic B-Splines and Quantile Regression for description on preparing the data and saving it in Stata format.

   use weco14.dta, clear

Heckman two-step procedure

To estimate the equation of productivity, using only non-quitters. To do so we need to use the Heckman two-step procedure following Lecture 21. But first we need to crate a dummy variable that identifies non quitters, and run a probit regression:

   gen lex2 = lex^2
gen nonkwit = (kwit == 0)
list in 1/5
     +---------------------------------------------------------------------------------+
| y sex dex lex kwit job_te~e status treatm~t ypost nonkwit |
|---------------------------------------------------------------------------------|
1. | 13.73 0 38 10 0 277 1 1 14.35 1 |
2. | 17.15 1 55 11 1 173 1 . . 0 |
3. | 13.63 1 45 12 0 410 1 1 15.75 1 |
4. | 13.04 1 41 11 0 247 1 0 18.33 1 |
5. | 13.2 1 42 10 0 340 1 0 13.96 1 |
+---------------------------------------------------------------------------------+

After we have all the variables we follow the “recipe” in Lecture 21

  1. Estimate binary choice model by probit
   probit nonkwit sex dex lex lex2

Iteration 0:   log likelihood = -372.98741  
Iteration 1: log likelihood = -339.87696
Iteration 2: log likelihood = -339.69113
Iteration 3: log likelihood = -339.69113

Probit regression Number of obs = 683
LR chi2(4) = 66.59
Prob > chi2 = 0.0000
Log likelihood = -339.69113 Pseudo R2 = 0.0893

------------------------------------------------------------------------------
nonkwit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex | -.2715531 .113097 -2.40 0.016 -.4932191 -.0498871
dex | .0580076 .0082217 7.06 0.000 .0418934 .0741219
lex | 1.155319 .3942607 2.93 0.003 .3825825 1.928056
lex2 | -.0470172 .0157425 -2.99 0.003 -.077872 -.0161624
_cons | -8.698287 2.499655 -3.48 0.001 -13.59752 -3.799054
------------------------------------------------------------------------------
  1. Construct \(\hat{\lambda_i} = \frac{\phi(x'_i\gamma)}{\Phi(x'_i\gamma)}\)
   predict xb, xb 
gen smallphi=normalden(xb)
gen largephi=normprob(xb)
gen lambda=smallphi/largephi
  1. Re estimate original model using only \(y_i > 0\) observations but including \(\hat{\lambda_i}\) as additional explanatory variable
   reg y sex dex lex lex2 lambda if nonkwit==1

      Source |       SS       df       MS              Number of obs =     522
-------------+------------------------------ F( 5, 516) = 55.86
Model | 371.793456 5 74.3586912 Prob > F = 0.0000
Residual | 686.84747 516 1.33109975 R-squared = 0.3512
-------------+------------------------------ Adj R-squared = 0.3449
Total | 1058.64093 521 2.03194036 Root MSE = 1.1537

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex | -.6917229 .215062 -3.22 0.001 -1.114228 -.269218
dex | .0742407 .0392891 1.89 0.059 -.0029455 .1514269
lex | -.0929801 .9821782 -0.09 0.925 -2.02254 1.83658
lex2 | .0038033 .0400168 0.10 0.924 -.0748125 .0824192
lambda | -1.622262 1.652698 -0.98 0.327 -4.869106 1.624581
_cons | 12.9142 8.05954 1.60 0.110 -2.919347 28.74774
------------------------------------------------------------------------------

Then you can test for sample selectivity problems by checking the significance of \(\hat{\lambda_i}\), as remarked in Lecture 21. Please indicate what model you should use after all, based on the sample selectivity test.

Powell’s estimator

As pointed out by Lecture 21 a problem with the Gaussian MLE is that it can perform poorly in non-Gaussian and/or heteroscedastic circumstances. In that case we could use Powell estimator which can be implemented in Stata by using the clad function. To download the function write findit clad, select sg153 and click install. They syntax is

   clad depvar indepvars, reps(#) [ll(#) or ul(#)] 
where in reps(#) you specify the number of iterations for the bootstrap, then you must specify ll(#) if the censoring is at the bottom of the distribution (and place the value at which censoring occurs) or ul(#) if upper censored.

Note that under certain conditions this works for any \(F\) even if there is heteroskedasticity.


  1. Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu