Welcome to a new issue of e-Tutorial. Here we will apply Hausman-Taylor (1981) instrumental variables approach to the phuzics data of Problem Set 4. The estimation strategy is explained in Prof. Koenker’s Lecture Note 17. ¹

Data

The first thing you need is to download the phuzics panel data set, called phuzics10.txt from the Econ 508 web site. Save it in your preferred directory.

The next step is loading the Data in Stata. After defining your working directory (where you have your data saved):

  infile  id  yr  phd  sex  rphd  ru  y  Y  s  using  "phuzics10.txt", clear

Note: You should drop the first line of obs with missing values (due to the labels of variables in .txt file). Next you should declare the data a panel data set:

drop if id==.
  xtset id yr

Finally you can save it in the STATA format (I will save mine as "phuzics10.dta"), and upload it using a little STATA program you are going to write with your panel functions.

PQ.do

The first step towards the panel data estimation is to transform your data into group means and deviations of group means. There's a specific code in STATA for that, called PQ.do :

* deviations from group means (Q). 
capture program drop PQ 
program define PQ 
version 4.0 
        local options "Level(integer $S_level)" 
        local varlist "req ex" 
        parse "`*'" 
        parse "`varlist'",parse(" ") 
                sort id 
                quietly by id: gen P`1'=sum(`1')/sum(`1'~=.) 
                quietly by id: replace P`1'=P`1'[_N] 
                quietly gen Q`1'=`1'-P`1' 
end

You can download the code at the Econ 508 webpage (Routines , PQ.do), and save it. In STATA, go to "Files", "Do...", and select the PQ.do file you have saved. As you open the file in STATA, it automatically runs the code. After that you can use the function by typing "PQvariablename". For example, if you type PQy, two tranformations of y will be added to your list of variables:

   PQ y

This will generate two variables:
Py for the group means of y (used by the between estimators), and
Qy for the deviations of group means of y (used by the within estimators).

You should apply this function for all variables used in your estimations. For example, you will see that the PQ routine will be used inside the program ht.do, to run the Hausman-Taylor Instrumental Variables estimators.

Estimating Phuzicists Productivity

In Problem Set 4 you are asked to explore “the phuzical revolution”. We will use this setting to see Hausman and Taylor’s approach at work. The model suggested in the Hints of the problem set is:

\[ log y_{it}= \Sigma_{s-1} ^{q} \rho log y_{it-s}+ f(t,t_{0i},t-t_0i,r_{i}) + u_{it} \]

so a working model may take the form:

\[ log y_{it} = \beta_0 + \Sigma_{s=1} ^2 \rho_i \log y_{it-s} + \beta_1e_{it} + \beta_2e^2_{i,t} + \beta_3 \frac{1}{e_{it} \times r_{i}} +\beta_4 gender_{i} + \beta_5 d80_i + \alpha_{it} + u_{it} \]

In order to compute the HTIV estimators we will write our own program: ht.do. The Econ 508 webpage (Routines) provides a base program for this, called ht.do. You can download the file in the same way you did above. Some details must be rexplained, though:

1) If you have'nt run PQ.do until now, please do so. Otherwise the program ht.do will not work.

2) The program ht.do contains some features that should be adjusted according to the user, such as the path to access the data set, the directory where to create a log file, etc. So, don't forget to adjust the program to your machine.

3) The most important detail: the user should specify the model, create new variables, and decide which variables will be included in the regression and/or treated as instruments.

Thus, it is essential to read Professor Koenker's Lecture notes and Hausman-Taylor (1981), as well as a good interpretation of the PS4 and auxiliar papers, in order to understand what the program is doing and how you need to adjust it.

To make the task easier, here is a sample of the ht.do program (with small adjustments) to compute the productivity and the wages regressions:

   use "phuzics10.dta", clear 
   xtset id yr

   * Prepare variables of interest
   replace y=log(y) 
   gen exp=yr-phd 
   gen expsq=exp^2 
   gen ier=1/(exp*rphd) 
   gen d60=0 
   replace d60=1 if phd>60 
   gen y1=l.y
   gen y2=l2.y 
   * Drop observations for first two periods since they have no lagged values
   drop if y2==. 

   * Did you forget to run PQ.do before this program? If so, try again; otherwise, go ahead. 
   foreach var in y y1 y2 exp expsq rphd ier d60 sex ru Y s {
      PQ `var'
      }
   * Note the effect of PQ in the time fixed variables. E.g.: Pd60=d60, Qd60=0, Psex=sex, Qsex=0. 
   * Nonetheless, we need Pd60 and Psex later. Can you see where and why? 

   * POOLED OLS 
   reg y y1 y2 exp expsq ier d60 sex

 
      Source |       SS       df       MS              Number of obs =    5448
-------------+------------------------------           F(  7,  5440) =  854.00
       Model |  1318.75786     7   188.39398           Prob > F      =  0.0000
    Residual |  1200.07751  5440  .220602484           R-squared     =  0.5236
-------------+------------------------------           Adj R-squared =  0.5229
       Total |  2518.83537  5447  .462426175           Root MSE      =  .46968

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          y1 |   .6604158   .0123983    53.27   0.000     .6361102    .6847213
          y2 |  -.3465397   .0119992   -28.88   0.000     -.370063   -.3230164
         exp |   .1184657    .004295    27.58   0.000     .1100459    .1268855
       expsq |  -.0024504   .0001085   -22.58   0.000    -.0026631   -.0022377
         ier |    1.28344   .1907015     6.73   0.000     .9095886    1.657291
         d60 |   .0011941   .0472823     0.03   0.980    -.0914981    .0938862
         sex |  -.0091688   .0202146    -0.45   0.650    -.0487974    .0304599
       _cons |   1.030669   .0567545    18.16   0.000     .9194075     1.14193
------------------------------------------------------------------------------

   * WITHIN ESTIMATORS (FIXED EFFECTS) 
   xtreg y y1 y2 exp expsq ier d60 sex, fe

note: d60 omitted because of collinearity
note: sex omitted because of collinearity

Fixed-effects (within) regression               Number of obs      =      5448
Group variable: id                              Number of groups   =       485

R-sq:  within  = 0.4862                         Obs per group: min =         1
       between = 0.6140                                        avg =      11.2
       overall = 0.5021                                        max =        45

                                                F(5,4958)          =    938.17
corr(u_i, Xb)  = 0.0651                         Prob > F           =    0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          y1 |    .543241   .0126155    43.06   0.000     .5185092    .5679729
          y2 |  -.4260858   .0120409   -35.39   0.000    -.4496912   -.4024803
         exp |    .155017   .0059574    26.02   0.000     .1433378    .1666961
       expsq |  -.0031535   .0001339   -23.56   0.000    -.0034159   -.0028911
         ier |   2.183352   .4471742     4.88   0.000     1.306693    3.060011
         d60 |          0  (omitted)
         sex |          0  (omitted)
       _cons |    1.25111   .0688997    18.16   0.000     1.116036    1.386184
-------------+----------------------------------------------------------------
     sigma_u |  .24755414
     sigma_e |  .44932935
         rho |  .23285611   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(484, 4958) =     3.84           Prob > F = 0.0000

   estimates store fe 
   * BETWEEN ESTIMATORS 
   xtreg y y1 y2 exp expsq ier d60 sex, be

Between regression (regression on group means)  Number of obs      =      5448
Group variable: id                              Number of groups   =       485

R-sq:  within  = 0.3948                         Obs per group: min =         1
       between = 0.8616                                        avg =      11.2
       overall = 0.4800                                        max =        45

                                                F(7,477)           =    424.37
sd(u_i + avg(e_i.))=  .1462628                  Prob > F           =    0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          y1 |   1.135907   .0344792    32.94   0.000     1.068157    1.203657
          y2 |   -.394536   .0333223   -11.84   0.000    -.4600127   -.3290593
         exp |   .0762504   .0095279     8.00   0.000     .0575286    .0949723
       expsq |  -.0019336    .000312    -6.20   0.000    -.0025466   -.0013206
         ier |   .0411571   .1858239     0.22   0.825    -.3239776    .4062917
         d60 |  -.1681255   .1154429    -1.46   0.146     -.394965     .058714
         sex |    .009449   .0199837     0.47   0.637    -.0298179    .0487159
       _cons |   .3928153   .1065342     3.69   0.000     .1834808    .6021497
------------------------------------------------------------------------------

   estimates store be
   * GLS ESTIMATORS (RANDOM EFFECTS): 
   xtreg y y1 y2 exp expsq ier d60 sex, re

Random-effects GLS regression                   Number of obs      =      5448
Group variable: id                              Number of groups   =       485

R-sq:  within  = 0.4679                         Obs per group: min =         1
       between = 0.7652                                        avg =      11.2
       overall = 0.5236                                        max =        45

                                                Wald chi2(7)       =   5977.98
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          y1 |   .6604158   .0123983    53.27   0.000     .6361156    .6847159
          y2 |  -.3465397   .0119992   -28.88   0.000    -.3700578   -.3230216
         exp |   .1184657    .004295    27.58   0.000     .1100477    .1268837
       expsq |  -.0024504   .0001085   -22.58   0.000    -.0026631   -.0022377
         ier |    1.28344   .1907015     6.73   0.000     .9096718    1.657208
         d60 |   .0011941   .0472823     0.03   0.980    -.0914775    .0938656
         sex |  -.0091688   .0202146    -0.45   0.650    -.0487886    .0304511
       _cons |   1.030669   .0567545    18.16   0.000     .9194322    1.141906
-------------+----------------------------------------------------------------
     sigma_u |          0
     sigma_e |  .44932935
         rho |          0   (fraction of variance due to u_i)
------------------------------------------------------------------------------

   estimates store re
   * HAUSMAN TEST: FIXED VS. RANDOM EFFECTS 
   hausman fe re

Instrumental variables

   * INSTRUMENTAL VARIABLES (1ST ROUND) 
   ivreg y (y1 y2 exp expsq ier d60 sex = Pexp Qexp Pexpsq Qexpsq Qy1 Qy2 Qier Pd60 Psex)


Instrumental variables (2SLS) regression

      Source |       SS       df       MS              Number of obs =    5448
-------------+------------------------------           F(  7,  5440) =  703.28
       Model |  1264.03996     7  180.577137           Prob > F      =  0.0000
    Residual |  1254.79541  5440  .230660922           R-squared     =  0.5018
-------------+------------------------------           Adj R-squared =  0.5012
       Total |  2518.83537  5447  .462426175           Root MSE      =  .48027

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          y1 |   .5435413   .0134808    40.32   0.000     .5171137     .569969
          y2 |  -.4260609   .0128697   -33.11   0.000    -.4512907   -.4008311
         exp |   .1533415   .0061232    25.04   0.000     .1413375    .1653454
       expsq |  -.0031088   .0001361   -22.84   0.000    -.0033757    -.002842
         ier |   2.151011   .4759039     4.52   0.000     1.218049    3.083973
         d60 |   .0072021   .0487178     0.15   0.882    -.0883042    .1027085
         sex |  -.0116087   .0206865    -0.56   0.575    -.0521625     .028945
       _cons |   1.258208   .0790441    15.92   0.000      1.10325    1.413167
------------------------------------------------------------------------------
Instrumented:  y1 y2 exp expsq ier d60 sex
Instruments:   Pexp Qexp Pexpsq Qexpsq Qy1 Qy2 Qier Pd60 Psex
------------------------------------------------------------------------------

   predict r,res 
   PQ r 
   gen Prsq=Pr^2 
   quietly bys id: gen mark=_n 

   *What does mark do? (see next regression) 
   quietly by id: gen T=_N 
   gen iT=1/T 
   reg Prsq iT if mark==1


      Source |       SS       df       MS              Number of obs =     485
-------------+------------------------------           F(  1,   483) =   41.37
       Model |  .358413506     1  .358413506           Prob > F      =  0.0000
    Residual |  4.18495546   483  .008664504           R-squared     =  0.0789
-------------+------------------------------           Adj R-squared =  0.0770
       Total |  4.54336897   484  .009387126           Root MSE      =  .09308

------------------------------------------------------------------------------
        Prsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          iT |   .1329607    .020673     6.43   0.000     .0923406    .1735808
       _cons |   .0375626   .0055915     6.72   0.000     .0265759    .0485494
------------------------------------------------------------------------------

   matrix b=get(_b) 
   gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T)) 

   *Now you need to transform the variables included in your model 
   replace y=y-(1-theta)*Py 
   replace y1=y1-(1-theta)*Py1 
   replace y2=y2-(1-theta)*Py2 
   replace exp=exp-(1-theta)*Pexp 
   replace expsq=expsq-(1-theta)*Pexpsq 
   replace ier=ier-(1-theta)*Pier 
   replace d60=d60-(1-theta)*Pd60 
   replace sex=sex-(1-theta)*Psex 

   * INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION) 
   ivreg y (y1 y2 exp expsq ier d60 sex theta = Qy1 Qy2 Qier Pexp Qexp Pexpsq Qexpsq Pd60 Psex theta), noconstant


Instrumental variables (2SLS) regression

      Source |       SS       df       MS              Number of obs =    5448
-------------+------------------------------           F(  8,  5440) =       .
       Model |  11578.2685     8  1447.28356           Prob > F      =       .
    Residual |  1064.92083  5440  .195757505           R-squared     =       .
-------------+------------------------------           Adj R-squared =       .
       Total |  12643.1893  5448  2.32070288           Root MSE      =  .44244

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          y1 |   .5432826   .0124213    43.74   0.000      .518932    .5676332
          y2 |  -.4260937   .0118562   -35.94   0.000    -.4493365   -.4028509
         exp |   .1548236   .0058089    26.65   0.000     .1434359    .1662114
       expsq |  -.0031484   .0001305   -24.13   0.000    -.0034042   -.0028926
         ier |    2.18223   .4399483     4.96   0.000     1.319756    3.044705
         d60 |   .0182557   .1448914     0.13   0.900    -.2657896    .3023009
         sex |  -.0125364   .0403387    -0.31   0.756    -.0916165    .0665436
       theta |   1.236176   .1561126     7.92   0.000     .9301332     1.54222
------------------------------------------------------------------------------
Instrumented:  y1 y2 exp expsq ier d60 sex theta
Instruments:   Qy1 Qy2 Qier Pexp Qexp Pexpsq Qexpsq Pd60 Psex theta
------------------------------------------------------------------------------

Why do we have theta as a variable and no intercept here?

   matrix list b
   sum theta

.    matrix list b

b[1,2]
           iT      _cons
y1  .13296069  .03756264

.    summarize theta

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       theta |      5448    .4519689    .1071866   .2700443   .8830183

References:

Hausman, Jerry, 1978, “Specification Tests in Econometrics,” Econometrica, 46, pp.1251-1271. Hausman, Jerry, and William Taylor, 1981, “Panel Data and Unobservable Individual Effects”, Econometrica, 49, No. 6, pp.1377-1398. Koenker, Roger, 2014, “Panel Data,” Lecture 17, mimeo, University of Illinois at Urbana-Champaign.

Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu ↩

Contact	Office Hours	E-mail
Prof. Roger Koenker	M. & W. 2:30-3:30 or by appointment (126 DKH)	rkoenker@illinois.edu
TA Nicolas Bottan	TBA	bottan2@illinois.edu

Applied Econometrics
Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

e-TA 11: Panel Data: Hausman-Taylor Approach

Data

Estimating Phuzicists Productivity

References:

Contact

Office Hours

E-mail

Applied Econometrics Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

e-TA 11: Panel Data: Hausman-Taylor Approach

Data

Estimating Phuzicists Productivity

References:

Contact

Office Hours

E-mail

Applied Econometrics
Econ 508 - Fall 2014