## TA: Nicolas Bottan

Welcome to a new issue of e-Tutorial. This e-TA will focus on Censored Regression Models, with special emphasis on helping answer question 4 of PS5. 1

# Data

You can download the data set, called weco14.csv from the Econ 508 web site. Save it in your preferred directory.

See the first section of e-TA 13 on Cubic B-Splines and Quantile Regression for description on preparing the data and saving it in Stata format.

   use weco14.dta, clear

# Heckman two-step procedure

To estimate the equation of productivity, using only non-quitters. To do so we need to use the Heckman two-step procedure following Lecture 21. But first we need to crate a dummy variable that identifies non quitters, and run a probit regression:

   gen lex2 = lex^2   gen nonkwit = (kwit == 0)   list in 1/5
     +---------------------------------------------------------------------------------+     |     y   sex   dex   lex   kwit   job_te~e   status   treatm~t   ypost   nonkwit |     |---------------------------------------------------------------------------------|  1. | 13.73     0    38    10      0        277        1          1   14.35         1 |  2. | 17.15     1    55    11      1        173        1          .       .         0 |  3. | 13.63     1    45    12      0        410        1          1   15.75         1 |  4. | 13.04     1    41    11      0        247        1          0   18.33         1 |  5. |  13.2     1    42    10      0        340        1          0   13.96         1 |     +---------------------------------------------------------------------------------+

After we have all the variables we follow the “recipe” in Lecture 21

1. Estimate binary choice model by probit
   probit nonkwit sex dex lex lex2

Iteration 0:   log likelihood = -372.98741  Iteration 1:   log likelihood = -339.87696  Iteration 2:   log likelihood = -339.69113  Iteration 3:   log likelihood = -339.69113  Probit regression                                 Number of obs   =        683                                                  LR chi2(4)      =      66.59                                                  Prob > chi2     =     0.0000Log likelihood = -339.69113                       Pseudo R2       =     0.0893------------------------------------------------------------------------------     nonkwit |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]-------------+----------------------------------------------------------------         sex |  -.2715531    .113097    -2.40   0.016    -.4932191   -.0498871         dex |   .0580076   .0082217     7.06   0.000     .0418934    .0741219         lex |   1.155319   .3942607     2.93   0.003     .3825825    1.928056        lex2 |  -.0470172   .0157425    -2.99   0.003     -.077872   -.0161624       _cons |  -8.698287   2.499655    -3.48   0.001    -13.59752   -3.799054------------------------------------------------------------------------------
1. Construct $$\hat{\lambda_i} = \frac{\phi(x'_i\gamma)}{\Phi(x'_i\gamma)}$$
   predict xb, xb    gen smallphi=normalden(xb)    gen largephi=normprob(xb)    gen lambda=smallphi/largephi
1. Re estimate original model using only $$y_i > 0$$ observations but including $$\hat{\lambda_i}$$ as additional explanatory variable
   reg y sex dex lex lex2 lambda if nonkwit==1

Source |       SS       df       MS              Number of obs =     522-------------+------------------------------           F(  5,   516) =   55.86       Model |  371.793456     5  74.3586912           Prob > F      =  0.0000    Residual |   686.84747   516  1.33109975           R-squared     =  0.3512-------------+------------------------------           Adj R-squared =  0.3449       Total |  1058.64093   521  2.03194036           Root MSE      =  1.1537------------------------------------------------------------------------------           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------         sex |  -.6917229    .215062    -3.22   0.001    -1.114228    -.269218         dex |   .0742407   .0392891     1.89   0.059    -.0029455    .1514269         lex |  -.0929801   .9821782    -0.09   0.925     -2.02254     1.83658        lex2 |   .0038033   .0400168     0.10   0.924    -.0748125    .0824192      lambda |  -1.622262   1.652698    -0.98   0.327    -4.869106    1.624581       _cons |    12.9142    8.05954     1.60   0.110    -2.919347    28.74774------------------------------------------------------------------------------

Then you can test for sample selectivity problems by checking the significance of $$\hat{\lambda_i}$$, as remarked in Lecture 21. Please indicate what model you should use after all, based on the sample selectivity test.

# Powell’s estimator

As pointed out by Lecture 21 a problem with the Gaussian MLE is that it can perform poorly in non-Gaussian and/or heteroscedastic circumstances. In that case we could use Powell estimator which can be implemented in Stata by using the clad function. To download the function write findit clad, select sg153 and click install. They syntax is

   clad depvar indepvars, reps(#) [ll(#) or ul(#)] 
where in reps(#) you specify the number of iterations for the bootstrap, then you must specify ll(#) if the censoring is at the bottom of the distribution (and place the value at which censoring occurs) or ul(#) if upper censored.

Note that under certain conditions this works for any $$F$$ even if there is heteroskedasticity.

1. Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu