## TA: Nicolas Bottan

Welcome to a new issue of e-Tutorial. Here we will apply Hausman-Taylor (1981) instrumental variables approach to the phuzics data of Problem Set 4. The estimation strategy is explained in Prof. Koenker’s Lecture Note 17. 1

# Data

The first thing you need is to download the phuzics panel data set, called phuzics10.txt from the Econ 508 web site. Save it in your preferred directory.

  infile  id  yr  phd  sex  rphd  ru  y  Y  s  using  "phuzics10.txt", clear

Note: You should drop the first line of obs with missing values (due to the labels of variables in .txt file). Next you should declare the data a panel data set:

drop if id==.  xtset id yr
Finally you can save it in the STATA format (I will save mine as "phuzics10.dta"), and upload it using a little STATA program you are going to write with your panel functions.

PQ.do

The first step towards the panel data estimation is to transform your data into group means and deviations of group means. There's a specific code in STATA for that, called PQ.do :
* deviations from group means (Q). capture program drop PQ program define PQ version 4.0         local options "Level(integer \$S_level)"         local varlist "req ex"         parse "*'"         parse "varlist'",parse(" ")                 sort id                 quietly by id: gen P1'=sum(1')/sum(1'~=.)                 quietly by id: replace P1'=P1'[_N]                 quietly gen Q1'=1'-P1' end
You can download the code at the Econ 508 webpage (Routines , PQ.do), and save it. In STATA, go to "Files", "Do...", and select the PQ.do file you have saved. As you open the file in STATA, it automatically runs the code. After that you can use the function by typing "PQvariablename". For example, if you type PQy, two tranformations of y will be added to your list of variables:
   PQ y 
This will generate two variables:
Py  for the group means of y (used by the between estimators), and
Qy  for the deviations of group means of y (used by the within estimators).

You should apply this function for all variables used in your estimations. For example, you will see that the PQ routine will be used inside the program ht.do, to run the Hausman-Taylor Instrumental Variables estimators.

# Estimating Phuzicists Productivity

In Problem Set 4 you are asked to explore “the phuzical revolution”. We will use this setting to see Hausman and Taylor’s approach at work. The model suggested in the Hints of the problem set is:

$log y_{it}= \Sigma_{s-1} ^{q} \rho log y_{it-s}+ f(t,t_{0i},t-t_0i,r_{i}) + u_{it}$

so a working model may take the form:

$log y_{it} = \beta_0 + \Sigma_{s=1} ^2 \rho_i \log y_{it-s} + \beta_1e_{it} + \beta_2e^2_{i,t} + \beta_3 \frac{1}{e_{it} \times r_{i}} +\beta_4 gender_{i} + \beta_5 d80_i + \alpha_{it} + u_{it}$

In order to compute the HTIV estimators we will write our own program: ht.do. The Econ 508 webpage (Routines) provides a base program for this, called ht.do. You can download the file in the same way you did above. Some details must be rexplained, though:

1) If you have'nt run PQ.do until now, please do so. Otherwise the program ht.do will not work.

2) The program ht.do contains some features that should be adjusted according to the user, such as the path to access the data set, the directory where to create a log file, etc. So, don't forget to adjust the program to your machine.

3) The most important detail: the user should specify the model, create new variables, and decide which variables will be included in the regression and/or treated as instruments.

Thus, it is essential to read Professor Koenker's Lecture notes and Hausman-Taylor (1981), as well as a good interpretation of the PS4 and auxiliar papers, in order to understand what the program is doing and how you need to adjust it.

To make the task easier, here is a sample of the ht.do program (with small adjustments) to compute the productivity and the wages regressions:
   use "phuzics10.dta", clear    xtset id yr   * Prepare variables of interest   replace y=log(y)    gen exp=yr-phd    gen expsq=exp^2    gen ier=1/(exp*rphd)    gen d60=0    replace d60=1 if phd>60    gen y1=l.y   gen y2=l2.y    * Drop observations for first two periods since they have no lagged values   drop if y2==.    * Did you forget to run PQ.do before this program? If so, try again; otherwise, go ahead.    foreach var in y y1 y2 exp expsq rphd ier d60 sex ru Y s {      PQ var'      }   * Note the effect of PQ in the time fixed variables. E.g.: Pd60=d60, Qd60=0, Psex=sex, Qsex=0.    * Nonetheless, we need Pd60 and Psex later. Can you see where and why?    * POOLED OLS    reg y y1 y2 exp expsq ier d60 sex  
       Source |       SS       df       MS              Number of obs =    5448-------------+------------------------------           F(  7,  5440) =  854.00       Model |  1318.75786     7   188.39398           Prob > F      =  0.0000    Residual |  1200.07751  5440  .220602484           R-squared     =  0.5236-------------+------------------------------           Adj R-squared =  0.5229       Total |  2518.83537  5447  .462426175           Root MSE      =  .46968------------------------------------------------------------------------------           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------          y1 |   .6604158   .0123983    53.27   0.000     .6361102    .6847213          y2 |  -.3465397   .0119992   -28.88   0.000     -.370063   -.3230164         exp |   .1184657    .004295    27.58   0.000     .1100459    .1268855       expsq |  -.0024504   .0001085   -22.58   0.000    -.0026631   -.0022377         ier |    1.28344   .1907015     6.73   0.000     .9095886    1.657291         d60 |   .0011941   .0472823     0.03   0.980    -.0914981    .0938862         sex |  -.0091688   .0202146    -0.45   0.650    -.0487974    .0304599       _cons |   1.030669   .0567545    18.16   0.000     .9194075     1.14193------------------------------------------------------------------------------
   * WITHIN ESTIMATORS (FIXED EFFECTS)    xtreg y y1 y2 exp expsq ier d60 sex, fe 
note: d60 omitted because of collinearitynote: sex omitted because of collinearityFixed-effects (within) regression               Number of obs      =      5448Group variable: id                              Number of groups   =       485R-sq:  within  = 0.4862                         Obs per group: min =         1       between = 0.6140                                        avg =      11.2       overall = 0.5021                                        max =        45                                                F(5,4958)          =    938.17corr(u_i, Xb)  = 0.0651                         Prob > F           =    0.0000------------------------------------------------------------------------------           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------          y1 |    .543241   .0126155    43.06   0.000     .5185092    .5679729          y2 |  -.4260858   .0120409   -35.39   0.000    -.4496912   -.4024803         exp |    .155017   .0059574    26.02   0.000     .1433378    .1666961       expsq |  -.0031535   .0001339   -23.56   0.000    -.0034159   -.0028911         ier |   2.183352   .4471742     4.88   0.000     1.306693    3.060011         d60 |          0  (omitted)         sex |          0  (omitted)       _cons |    1.25111   .0688997    18.16   0.000     1.116036    1.386184-------------+----------------------------------------------------------------     sigma_u |  .24755414     sigma_e |  .44932935         rho |  .23285611   (fraction of variance due to u_i)------------------------------------------------------------------------------F test that all u_i=0:     F(484, 4958) =     3.84           Prob > F = 0.0000 
   estimates store fe    * BETWEEN ESTIMATORS    xtreg y y1 y2 exp expsq ier d60 sex, be 
Between regression (regression on group means)  Number of obs      =      5448Group variable: id                              Number of groups   =       485R-sq:  within  = 0.3948                         Obs per group: min =         1       between = 0.8616                                        avg =      11.2       overall = 0.4800                                        max =        45                                                F(7,477)           =    424.37sd(u_i + avg(e_i.))=  .1462628                  Prob > F           =    0.0000------------------------------------------------------------------------------           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------          y1 |   1.135907   .0344792    32.94   0.000     1.068157    1.203657          y2 |   -.394536   .0333223   -11.84   0.000    -.4600127   -.3290593         exp |   .0762504   .0095279     8.00   0.000     .0575286    .0949723       expsq |  -.0019336    .000312    -6.20   0.000    -.0025466   -.0013206         ier |   .0411571   .1858239     0.22   0.825    -.3239776    .4062917         d60 |  -.1681255   .1154429    -1.46   0.146     -.394965     .058714         sex |    .009449   .0199837     0.47   0.637    -.0298179    .0487159       _cons |   .3928153   .1065342     3.69   0.000     .1834808    .6021497------------------------------------------------------------------------------
   estimates store be   * GLS ESTIMATORS (RANDOM EFFECTS):    xtreg y y1 y2 exp expsq ier d60 sex, re 
Random-effects GLS regression                   Number of obs      =      5448Group variable: id                              Number of groups   =       485R-sq:  within  = 0.4679                         Obs per group: min =         1       between = 0.7652                                        avg =      11.2       overall = 0.5236                                        max =        45                                                Wald chi2(7)       =   5977.98corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000------------------------------------------------------------------------------           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]-------------+----------------------------------------------------------------          y1 |   .6604158   .0123983    53.27   0.000     .6361156    .6847159          y2 |  -.3465397   .0119992   -28.88   0.000    -.3700578   -.3230216         exp |   .1184657    .004295    27.58   0.000     .1100477    .1268837       expsq |  -.0024504   .0001085   -22.58   0.000    -.0026631   -.0022377         ier |    1.28344   .1907015     6.73   0.000     .9096718    1.657208         d60 |   .0011941   .0472823     0.03   0.980    -.0914775    .0938656         sex |  -.0091688   .0202146    -0.45   0.650    -.0487886    .0304511       _cons |   1.030669   .0567545    18.16   0.000     .9194322    1.141906-------------+----------------------------------------------------------------     sigma_u |          0     sigma_e |  .44932935         rho |          0   (fraction of variance due to u_i)------------------------------------------------------------------------------
   estimates store re   * HAUSMAN TEST: FIXED VS. RANDOM EFFECTS    hausman fe re
Instrumental variables
   * INSTRUMENTAL VARIABLES (1ST ROUND)    ivreg y (y1 y2 exp expsq ier d60 sex = Pexp Qexp Pexpsq Qexpsq Qy1 Qy2 Qier Pd60 Psex) 
Instrumental variables (2SLS) regression      Source |       SS       df       MS              Number of obs =    5448-------------+------------------------------           F(  7,  5440) =  703.28       Model |  1264.03996     7  180.577137           Prob > F      =  0.0000    Residual |  1254.79541  5440  .230660922           R-squared     =  0.5018-------------+------------------------------           Adj R-squared =  0.5012       Total |  2518.83537  5447  .462426175           Root MSE      =  .48027------------------------------------------------------------------------------           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------          y1 |   .5435413   .0134808    40.32   0.000     .5171137     .569969          y2 |  -.4260609   .0128697   -33.11   0.000    -.4512907   -.4008311         exp |   .1533415   .0061232    25.04   0.000     .1413375    .1653454       expsq |  -.0031088   .0001361   -22.84   0.000    -.0033757    -.002842         ier |   2.151011   .4759039     4.52   0.000     1.218049    3.083973         d60 |   .0072021   .0487178     0.15   0.882    -.0883042    .1027085         sex |  -.0116087   .0206865    -0.56   0.575    -.0521625     .028945       _cons |   1.258208   .0790441    15.92   0.000      1.10325    1.413167------------------------------------------------------------------------------Instrumented:  y1 y2 exp expsq ier d60 sexInstruments:   Pexp Qexp Pexpsq Qexpsq Qy1 Qy2 Qier Pd60 Psex------------------------------------------------------------------------------
   predict r,res    PQ r    gen Prsq=Pr^2    quietly bys id: gen mark=_n    *What does mark do? (see next regression)    quietly by id: gen T=_N    gen iT=1/T    reg Prsq iT if mark==1  
      Source |       SS       df       MS              Number of obs =     485-------------+------------------------------           F(  1,   483) =   41.37       Model |  .358413506     1  .358413506           Prob > F      =  0.0000    Residual |  4.18495546   483  .008664504           R-squared     =  0.0789-------------+------------------------------           Adj R-squared =  0.0770       Total |  4.54336897   484  .009387126           Root MSE      =  .09308------------------------------------------------------------------------------        Prsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------          iT |   .1329607    .020673     6.43   0.000     .0923406    .1735808       _cons |   .0375626   .0055915     6.72   0.000     .0265759    .0485494------------------------------------------------------------------------------
   matrix b=get(_b)    gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T))    *Now you need to transform the variables included in your model    replace y=y-(1-theta)*Py    replace y1=y1-(1-theta)*Py1    replace y2=y2-(1-theta)*Py2    replace exp=exp-(1-theta)*Pexp    replace expsq=expsq-(1-theta)*Pexpsq    replace ier=ier-(1-theta)*Pier    replace d60=d60-(1-theta)*Pd60    replace sex=sex-(1-theta)*Psex    * INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION)    ivreg y (y1 y2 exp expsq ier d60 sex theta = Qy1 Qy2 Qier Pexp Qexp Pexpsq Qexpsq Pd60 Psex theta), noconstant 
Instrumental variables (2SLS) regression      Source |       SS       df       MS              Number of obs =    5448-------------+------------------------------           F(  8,  5440) =       .       Model |  11578.2685     8  1447.28356           Prob > F      =       .    Residual |  1064.92083  5440  .195757505           R-squared     =       .-------------+------------------------------           Adj R-squared =       .       Total |  12643.1893  5448  2.32070288           Root MSE      =  .44244------------------------------------------------------------------------------           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------          y1 |   .5432826   .0124213    43.74   0.000      .518932    .5676332          y2 |  -.4260937   .0118562   -35.94   0.000    -.4493365   -.4028509         exp |   .1548236   .0058089    26.65   0.000     .1434359    .1662114       expsq |  -.0031484   .0001305   -24.13   0.000    -.0034042   -.0028926         ier |    2.18223   .4399483     4.96   0.000     1.319756    3.044705         d60 |   .0182557   .1448914     0.13   0.900    -.2657896    .3023009         sex |  -.0125364   .0403387    -0.31   0.756    -.0916165    .0665436       theta |   1.236176   .1561126     7.92   0.000     .9301332     1.54222------------------------------------------------------------------------------Instrumented:  y1 y2 exp expsq ier d60 sex thetaInstruments:   Qy1 Qy2 Qier Pexp Qexp Pexpsq Qexpsq Pd60 Psex theta------------------------------------------------------------------------------
Why do we have theta as a variable and no intercept here?
   matrix list b   sum theta
.    matrix list bb[1,2]           iT      _consy1  .13296069  .03756264.    summarize theta    Variable |       Obs        Mean    Std. Dev.       Min        Max-------------+--------------------------------------------------------       theta |      5448    .4519689    .1071866   .2700443   .8830183`

# References:

Hausman, Jerry, 1978, “Specification Tests in Econometrics,” Econometrica, 46, pp.1251-1271. Hausman, Jerry, and William Taylor, 1981, “Panel Data and Unobservable Individual Effects”, Econometrica, 49, No. 6, pp.1377-1398. Koenker, Roger, 2014, “Panel Data,” Lecture 17, mimeo, University of Illinois at Urbana-Champaign.