Applied Econometrics at the University of Illinois: e-Tutorial 13: Panel Data II

	Econ 508	Econometrics Group
Home \| Faculty \| Students \| Alumni \| Courses \| Research \| Reproducibility \| Lab \| Seminars \| Economics \| Statistics \| Fame

Applied Econometrics
Econ 508 - Fall 2007

e-Tutorial 13: Panel Data II - Hausman-Taylor Approach

Welcome to the thirteenth issue of e-Tutorial. Here I will apply the Hausman-Taylor (1981) instrumental variables approach to the phuzics data of PS4. The estimation strategy is explained in Koenker's Lecture 16 (2005, and the respective routines to implement such strategies are given in both STATA and R. I hope this helps in PS4. Have fun!!!

Downloading your data:
You can download your data from the Econ 508 webpage (here) and save the file in your preferred directory (I'll save mine as "C:\phuzics01.txt"). Note that the names of the variables are slightly different from the PS4. In the data set:

id: person identifier sex: gender (female==1) y: page equivalent in current year

yr: current year - 1900 rphd: rank of PhD Y: discounted cumulative page equivalent

phd: year of PhD - 1900 ru: dummy for research university(res.pos.==1) s: current annual salary

In R:
See Appendix A for a quick session in R.

In STATA:
First you need to expand the memory dedicated to STATA in your computer:
set memory 1g

Then you can infile the data by typing:
infile id yr phd sex rphd ru y Y s using "C:\phuzics01.txt"

Note: Drop the first line of obs with missing values (due to the labels of variables in .txt file).

After that you declare your data set as panel, sort by id, and summarize the data:
iis id
tis yr
sort id
summarize

Finally you can save it in the STATA format (I will save mine as "C:\phuzics01.dta"), and upload it using a little STATA program you are going to write with your panel functions.

PQ.do
The first step towards the panel data estimation is to transform your data into group means and deviations of group means. There's a specific code in STATA for that, called PQ.do :

* A simple program for computing group means (P) and
* deviations from group means (Q).
capture program drop PQ
program define PQ
version 4.0
        local options "Level(integer $S_level)"
        local varlist "req ex"
        parse "`*'"
        parse "`varlist'",parse(" ")
                sort id
                quietly by id: gen P`1'=sum(`1')/sum(`1'~=.)
                quietly by id: replace P`1'=P`1'[_N]
                quietly gen Q`1'=`1'-P`1'
end

You can download the code at the Econ 508 webpage (Routines , PQ.do), and save it. In STATA, go to "Files", "Do...", and select the PQ.do file you have saved. As you open the file in STATA, it automatically runs the code. After that you can use the function by typing "PQvariablename". For example, if you type PQy, two tranformations of y will be added to your list of variables:

Py for the group means of y (used by the between estimators), and
Qy for the deviations of group means of y (used by the within estimators).

You should apply this function for all variables used in your estimations. For example, you will see that the PQ routine will be used inside the program ht.do, to run the Hausman-Taylor Instrumental Variables estimators.

ht.do
The second step is to write your own program in order to compute the HTIV estimators. The Econ 508 webpage (Routines) provides a base program for this, called ht.do. You can download the file in the same way you did above. Some details must be rexplained, though:

1) If you have'nt run PQ.do until now, please do so. Otherwise the program ht.do will not work.

2) The program ht.do contains some features that should be adjusted according to the user, such as the path to access the data set, the directory where to create a log file, etc. So, don't forget to adjust the program to your machine.

3) The most important detail: the user should specify the model, create new variables, and decide which variables will be included in the regression and/or treated as instruments.

Thus, it is essential to read Koenker's Lecture 13 (2004) and Hausman-Taylor (1981), as well as a good interpretation of the PS4 and auxiliar papers, in order to understand what the program is doing and how you need to adjust it.

To facilitate your job, I included below a sample of the ht.do program (with small adjustments) to compute the productivity and the wages regressions:

The Productivity Equation

. do "C:\produc01.do"
. use "C:\phuzics01.dta", clear
. iis id
. tis yr
. sort id
. replace y=log(y)
(10623 real changes made)
. gen exp=yr-phd
. gen expsq=exp^2
. gen ier=1/(exp*rphd)
. gen d60=0
. replace d60=1 if phd>60
(8788 real changes made)
. quietly by id: gen y1=y[_n-1]
. quietly by id: gen y2=y[_n-2]
. drop if y2==.
(1000 observations deleted)
. * Why can't we ommitt the above command?
. * Stata can handle missing obs, but here we will
. * have problems if we let Stata do our work. Why?
. * Did you forget to run PQ.do before this program? If so, try again; otherwise, go ahead.
. PQ y
. PQ y1
. PQ y2
. PQ exp
. PQ expsq
. PQ rphd
. PQ ier
. PQ d60
. PQ sex
. PQ ru
. PQ Y
. PQ s
. * Note the effect of PQ in the time fixed variables. E.g.: Pd60=d60, Qd60=0, Psex=sex, Qsex=0.
. * Nonetheless, we need Pd60 and Psex later. Can you see where and why?
. * POOLED OLS
. regress y y1 y2 exp expsq ier d60 sex
Source |       SS       df       MS                  Number of obs =    9623
---------+------------------------------               F( 7, 9615) = 3485.58
   Model | 2482.67217     7 354.667452               Prob > F      = 0.0000
Residual | 978.354138 9615    .1017529               R-squared     = 0.7173
---------+------------------------------               Adj R-squared = 0.7171
   Total | 3461.02631 9622 .359699263               Root MSE      = .31899

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      y1 |   .7788283   .0099088     78.600   0.000        .759405    .7982516
      y2 | -.2373115   .0095526    -24.843   0.000      -.2560365   -.2185865
     exp |   .0837589   .0021972     38.120   0.000       .0794519     .088066
   expsq | -.0016344   .0000441    -37.102   0.000      -.0017207    -.001548
     ier |   3.407355   .1403287     24.281   0.000       3.132281    3.682429
     d60 |   .0204149   .0098334      2.076   0.038       .0011395    .0396904
     sex | -.0189182   .0093459     -2.024   0.043      -.0372382   -.0005982
   _cons |   .4432838   .0192268     23.056   0.000       .4055953    .4809723
------------------------------------------------------------------------------

. * WITHIN ESTIMATORS (FIXED EFFECTS)
. xtreg   y y1 y2 exp expsq ier d60 sex, fe
Fixed-effects (within) regression               Number of obs      =      9623
Group variable (i) : id                         Number of groups   =       500

R-sq: within = 0.6535                         Obs per group: min =         5
       between = 0.7696                                        avg =      19.2
       overall = 0.6594                                        max =        46

F(5,9118) = 3438.90
corr(u_i, Xb) = 0.2409 Prob > F = 0.0000

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      y1 |   .6573427   .0098663     66.625   0.000       .6380026    .6766828
      y2 | -.3369351   .0094686    -35.585   0.000      -.3554957   -.3183745
     exp |    .101091   .0023692     42.669   0.000       .0964469    .1057351
   expsq | -.0020004    .000046    -43.470   0.000      -.0020906   -.0019102
     ier |   .2556901   .2487721      1.028   0.304       -.231959    .7433392
     d60 | (dropped)
     sex | (dropped)
   _cons |   1.037784   .0282838     36.692   0.000       .9823415    1.093227
------------------------------------------------------------------------------
sigma_u | .20105547
sigma_e |   .3014717
     rho | .30784989   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(499,9118) =     3.30            Prob > F = 0.0000

. hausman, save

. * BETWEEN ESTIMATORS
. xtreg   y y1 y2 exp expsq ier d60 sex, be
Between regression (regression on group means) Number of obs      =      9623
Group variable (i) : id                         Number of groups   =       500

R-sq: within = 0.5435                         Obs per group: min =         5
       between = 0.9916                                        avg =      19.2
       overall = 0.6503                                        max =        46

F(7,492) = 8303.54
sd(u_i + avg(e_i.))= .0323035 Prob > F = 0.0000

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     y1 |    1.49359   .0292156     51.123   0.000       1.436188    1.550993
      y2 |   -.519665   .0294238    -17.661   0.000      -.5774767   -.4618533
     exp |   .0045644   .0027111      1.684   0.093      -.0007622    .0098911
   expsq | -.0001547   .0000683     -2.264   0.024      -.0002889   -.0000204
     ier |   .0081957   .0830353      0.099   0.921      -.1549517    .1713432
     d60 | -.0079103   .0113073     -0.700   0.485      -.0301268    .0143062
     sex | -.0039254   .0042214     -0.930   0.353      -.0122197    .0043688
   _cons |   .0810791   .0155373      5.218   0.000       .0505515    .1116067
------------------------------------------------------------------------------

. * GLS ESTIMATORS (RANDOM EFFECTS):
. xtreg   y y1 y2 exp expsq ier d60 sex, re
Random-effects GLS regression                   Number of obs      =      9623
Group variable (i) : id                         Number of groups   =       500

R-sq: within = 0.6324                         Obs per group: min =         5
       between = 0.9390                                        avg =      19.2
       overall = 0.7173                                        max =        46

Random effects u_i ~ Gaussian Wald chi2(7) = 24399.03
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      y1 |   .7788283   .0099088     78.600   0.000       .7594074    .7982492
      y2 | -.2373115   .0095526    -24.843   0.000      -.2560342   -.2185889
     exp |   .0837589   .0021972     38.120   0.000       .0794524    .0880654
   expsq | -.0016344   .0000441    -37.102   0.000      -.0017207    -.001548
     ier |   3.407355   .1403287     24.281   0.000       3.132316    3.682395
     d60 |   .0204149   .0098334      2.076   0.038       .0011419     .039688
     sex | -.0189182   .0093459     -2.024   0.043      -.0372359   -.0006005
   _cons |   .4432838   .0192268     23.056   0.000       .4056001    .4809675
---------+--------------------------------------------------------------------
sigma_u |          0
sigma_e |   .3014717
     rho |          0   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. * HAUSMAN TEST: FIXED VS. RANDOM EFFECTS
. hausman
            ---- Coefficients ----
         |      (b)          (B)            (b-B)   sqrt(diag(V_b-V_B))
         |     Prior       Current       Difference        S.E.
---------+-------------------------------------------------------------
      y1 |   .6573427     .7788283        -.1214856            .
      y2 | -.3369351    -.2373115        -.0996236            .
     exp |    .101091     .0837589          .017332     .0008861
   expsq | -.0020004    -.0016344         -.000366     .0000133
     ier |   .2556901     3.407355        -3.151665     .2054152
---------+-------------------------------------------------------------
            b = less efficient estimates obtained previously from xtreg.
            B = fully efficient estimates obtained from xtreg.

Test: Ho: difference in coefficients not systematic
            chi2( 5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                      = 2114.69
            Prob>chi2 =     0.0000

. * INSTRUMENTAL VARIABLES (1ST ROUND)
. regress y y1 y2 exp expsq ier d60 sex (Pexp Qexp Pexpsq Qexpsq Qy1 Qy2 Qier Pd60 Psex)
Instrumental variables (2SLS) regression

Source |       SS       df       MS                  Number of obs =    9623
---------+------------------------------               F( 7, 9615) = 2063.13
   Model | 2274.04125     7 324.863035               Prob > F      = 0.0000
Residual | 1186.98506 9615 .123451384               R-squared     = 0.6570
---------+------------------------------               Adj R-squared = 0.6568
   Total | 3461.02631 9622 .359699263               Root MSE      = .35136

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      y1 |   .6607854   .0114819     57.550   0.000       .6382785    .6832923
      y2 | -.3351626   .0110306    -30.385   0.000       -.356785   -.3135403
     exp |   .1023073   .0027215     37.593   0.000       .0969726     .107642
   expsq | -.0020375   .0000529    -38.501   0.000      -.0021412   -.0019338
     ier |   .3347733   .2895833      1.156   0.248       -.232871    .9024175
     d60 |   .0324825    .010838      2.997   0.003       .0112378    .0537272
     sex | -.0198867    .010298     -1.931   0.053      -.0400729    .0002995
   _cons |   .9895473   .0332799     29.734   0.000       .9243117    1.054783
------------------------------------------------------------------------------

. predict r,res
. PQ r
. gen Prsq=Pr^2
. quietly by id: gen mark=_n
. *What does mark do? (see next regression)
. quietly by id: gen T=_N
. gen iT=1/T
. regress Prsq iT if mark==1
Source |       SS       df       MS                  Number of obs =     500
---------+------------------------------               F( 1,   498) =    4.15
   Model | .011654194     1 .011654194               Prob > F      = 0.0420
Residual | 1.39687301   498 .002804966               R-squared     = 0.0083
---------+------------------------------               Adj R-squared = 0.0063
   Total |   1.4085272   499    .0028227               Root MSE      = .05296

------------------------------------------------------------------------------
    Prsq |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      iT |   .1525195   .0748252      2.038   0.042       .0055075    .2995315
   _cons |   .0291845   .0053924      5.412   0.000       .0185899    .0397791
------------------------------------------------------------------------------

. matrix b=get(_b)
. gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T))
. *Now you need to transform the variables included in your model
. replace y=y-(1-theta)*Py
(9623 real changes made)
. replace y1=y1-(1-theta)*Py1
(9623 real changes made)
. replace y2=y2-(1-theta)*Py2
(9623 real changes made)
. replace exp=exp-(1-theta)*Pexp
(9623 real changes made)
. replace expsq=expsq-(1-theta)*Pexpsq
(9623 real changes made)
. replace ier=ier-(1-theta)*Pier
(9623 real changes made)
. replace d60=d60-(1-theta)*Pd60
(7874 real changes made)
. replace sex=sex-(1-theta)*Psex
(1358 real changes made)
. * INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION)
. regress y y1 y2 exp expsq ier d60 sex theta (Qy1 Qy2 Qier Pexp Qexp Pexpsq Qexpsq Pd60 Psex theta), noconstant
Instrumental variables (2SLS) regression

Source |       SS       df       MS                  Number of obs =    9623
---------+------------------------------               F( 8, 9615) =       .
   Model | 18502.9839     8 2312.87299               Prob > F      =       .
Residual | 907.099508 9615 .094342123               R-squared     =       .
---------+------------------------------               Adj R-squared =       .
   Total | 19410.0834 9623 2.01705117               Root MSE      = .30715

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      y1 |   .6578081     .01005     65.454   0.000        .638108    .6775082
      y2 | -.3367606   .0096466    -34.910   0.000      -.3556698   -.3178513
     exp |   .1012754   .0024055     42.101   0.000         .09656    .1059907
   expsq | -.0020057   .0000468    -42.898   0.000      -.0020974   -.0019141
     ier |   .2589409   .2534411      1.022   0.307       -.237857    .7557388
     d60 |    .033342    .023551      1.416   0.157      -.0128229     .079507
     sex |    -.01201   .0203523     -0.590   0.555      -.0519049    .0278848
   theta |   1.009543   .0361539     27.924   0.000       .9386737    1.080412
------------------------------------------------------------------------------

. *Why do we have theta as a variable and no intercept here?
. matrix list b
b[1,2]
iT _cons
y1 .15251948 .02918446

. summarize theta
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
theta | 9623 .4463058 .0806179 .3194048 .7148795

. clear
end of do-file

The Wages Equation

. do "C:\wages01.do"
. use "C:\phuzics01.dta", clear
. iis id
. tis yr
. sort id
. quietly by id: gen s1=s[_n-1]
. quietly by id: gen Y1=Y[_n-1]
. *Shall I drop the missing variables here?
. drop if s1==.
(500 observations deleted)
. replace s=log(s/s1)
(10123 real changes made)
. replace Y=log(Y/Y1)
(10123 real changes made)
. PQ s
. PQ Y
. PQ sex
. PQ ru
. * POOLED OLS
. regress s Y ru sex
Source |       SS       df       MS                  Number of obs =   10123
---------+------------------------------               F( 3, 10119) = 3335.56
   Model | 7.67338445     3 2.55779482               Prob > F      = 0.0000
Residual | 7.75952035 10119 .000766827               R-squared     = 0.4972
---------+------------------------------               Adj R-squared = 0.4971
   Total | 15.4329048 10122 .001524689               Root MSE      = .02769

------------------------------------------------------------------------------
       s |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       Y |   .1795768   .0019558     91.816   0.000        .175743    .1834107
      ru |   .0142779   .0006994     20.414   0.000        .012907    .0156489
     sex | -.0082762   .0007914    -10.458   0.000      -.0098275   -.0067249
   _cons | -.0026954   .0004037     -6.676   0.000      -.0034869    -.001904
------------------------------------------------------------------------------

. * WITHIN ESTIMATORS (FIXED EFFECTS)
. xtreg   s Y ru sex, fe
Fixed-effects (within) regression               Number of obs      =     10123
Group variable (i) : id                         Number of groups   =       500

R-sq: within = 0.4614                         Obs per group: min =         6
       between = 0.6455                                        avg =      20.2
       overall = 0.4855                                        max =        47

F(2,9621) = 4120.97
corr(u_i, Xb) = 0.0939 Prob > F = 0.0000

------------------------------------------------------------------------------
       s |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       Y |   .1793187   .0021088     85.035   0.000       .1751851    .1834523
      ru |   .0064123    .001264      5.073   0.000       .0039346    .0088899
     sex | (dropped)
   _cons | -.0022472   .0004157     -5.406   0.000      -.0030619   -.0014324
------------------------------------------------------------------------------
sigma_u | .01011922
sigma_e | .02691058
     rho | .12388261   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(499,9621) =     2.19            Prob > F = 0.0000

. hausman, save

. * BETWEEN ESTIMATORS
. xtreg   s Y ru sex, be
Between regression (regression on group means) Number of obs      =     10123
Group variable (i) : id                         Number of groups   =       500

R-sq: within = 0.4591                         Obs per group: min =         6
       between = 0.7096                                        avg =      20.2
       overall = 0.4971                                        max =        47

F(3,496) = 404.06
sd(u_i + avg(e_i.))= .0088949 Prob > F = 0.0000

------------------------------------------------------------------------------
       s |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       Y |   .2020762    .007132     28.334   0.000       .1880636    .2160888
      ru |   .0167146   .0011768     14.203   0.000       .0144025    .0190268
     sex | -.0080017   .0011612     -6.891   0.000      -.0102832   -.0057203
   _cons | -.0069383   .0011596     -5.983   0.000      -.0092167     -.00466
------------------------------------------------------------------------------

. * GLS ESTIMATORS (RANDOM EFFECTS):
. xtreg   s Y ru sex, re
Random-effects GLS regression                   Number of obs      =     10123
Group variable (i) : id                         Number of groups   =       500

R-sq: within = 0.4602                         Obs per group: min =         6
       between = 0.7063                                        avg =      20.2
       overall = 0.4969                                        max =        47

Random effects u_i ~ Gaussian Wald chi2(3) = 9324.37
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

------------------------------------------------------------------------------
       s |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       Y |   .1785767   .0019929     89.608   0.000       .1746707    .1824826
      ru |   .0123244   .0008565     14.390   0.000       .0106458     .014003
     sex | -.0081517    .001123     -7.259   0.000      -.0103527   -.0059508
   _cons | -.0022302   .0005162     -4.320   0.000      -.0032419   -.0012184
---------+--------------------------------------------------------------------
sigma_u | .00597206
sigma_e | .02691058
     rho | .04693787   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. * HAUSMAN TEST: FIXED VS. RANDOM EFFECTS
. hausman
            ---- Coefficients ----
         |      (b)          (B)            (b-B)   sqrt(diag(V_b-V_B))
         |     Prior       Current       Difference        S.E.
---------+-------------------------------------------------------------
       Y |   .1793187     .1785767         .0007421     .0006894
      ru |   .0064123     .0123244        -.0059121     .0009296
---------+-------------------------------------------------------------
            b = less efficient estimates obtained previously from xtreg.
            B = fully efficient estimates obtained from xtreg.

Test: Ho: difference in coefficients not systematic

            chi2( 2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                      =    56.15
            Prob>chi2 =     0.0000

. * INSTRUMENTAL VARIABLES (1ST ROUND)
. regress s Y ru sex (PY QY Qru Psex)
Instrumental variables (2SLS) regression

Source |       SS       df       MS                  Number of obs =   10123
---------+------------------------------               F( 3, 10119) = 3168.81
   Model | 7.58349777     3 2.52783259               Prob > F      = 0.0000
Residual | 7.84940703 10119   .00077571               R-squared     = 0.4914
---------+------------------------------               Adj R-squared = 0.4912
   Total | 15.4329048 10122 .001524689               Root MSE      = .02785

------------------------------------------------------------------------------
       s |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       Y |   .1835144   .0020489     89.569   0.000       .1794983    .1875306
      ru |   .0067056   .0013071      5.130   0.000       .0041434    .0092678
     sex | -.0084882   .0007966    -10.656   0.000      -.0100496   -.0069268
   _cons | -.0016707   .0004326     -3.862   0.000      -.0025186   -.0008227
------------------------------------------------------------------------------

. predict r,res
. PQ r
. gen Prsq=Pr^2
. quietly by id: gen mark=_n
. *What does mark do? (see next regression)
. quietly by id: gen T=_N
. gen iT=1/T
. regress Prsq iT if mark==1
Source |       SS       df       MS                  Number of obs =     500
---------+------------------------------               F( 1,   498) =    4.46
   Model | 7.4757e-08     1 7.4757e-08               Prob > F      = 0.0352
Residual | 8.3462e-06   498 1.6759e-08               R-squared     = 0.0089
---------+------------------------------               Adj R-squared = 0.0069
   Total | 8.4209e-06   499 1.6876e-08               Root MSE      = .00013

------------------------------------------------------------------------------
    Prsq |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      iT |   .0004526   .0002143      2.112   0.035       .0000316    .0008736
   _cons |   .0000653   .0000141      4.630   0.000       .0000376     .000093
------------------------------------------------------------------------------

. matrix b=get(_b)
. gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T))
. *Now you need to transform the variables included in your model
. replace s=s-(1-theta)*Ps
(10078 real changes made)
. replace Y=Y-(1-theta)*PY
(10123 real changes made)
. replace sex=sex-(1-theta)*Psex
(1426 real changes made)
. replace ru=ru-(1-theta)*Pru
(6318 real changes made)
. * INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION)
. regress s Y ru sex theta (PY QY Qru Psex theta), noconstant
Instrumental variables (2SLS) regression

Source |       SS       df       MS                  Number of obs =   10123
---------+------------------------------               F( 4, 10119) =       .
   Model | 8.00935131     4 2.00233783               Prob > F      =       .
Residual | 7.19637615 10119 .000711175               R-squared     =       .
---------+------------------------------               Adj R-squared =       .
   Total | 15.2057275 10123 .001502097               Root MSE      = .02667

------------------------------------------------------------------------------
       s |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       Y |   .1809964   .0020571     87.988   0.000       .1769642    .1850287
      ru |   .0065199   .0012524      5.206   0.000        .004065    .0089748
     sex | -.0080601   .0015735     -5.122   0.000      -.0111445   -.0049757
   theta | -.0014172     .00067     -2.115   0.034      -.0027306   -.0001039
------------------------------------------------------------------------------

. *Why do we have theta as a variable and no intercept here?
. matrix list b
b[1,2]
           iT      _cons
y1 .00045259   .0000653
. summarize theta
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
   theta |   10123    .4892002   .0797415   .3584998   .7321285

. clear
end of do-file

Appendix A: Hausman-Taylor Instrumental Variables in R
Here you can reproduce the results above using R. You should start running the functions PQ.R, tsls.R, and htiv.R available at the Econ 508 webpage (Routines, panel.R).:

"htiv" <-
function(x, y, id, d, z = NULL)
{
#input:
# x design matrix partitioned as given in d
# y response vector
# id strata indicator
# d list of column numbers indicating partitioning of x
# x[,d[[1]]] is x1 -- exogonous time varying vars
# x[,d[[2]]] is x2 -- endogonous time varying vars
# x[,d[[3]]] is z1 -- exogonous time invariant vars
# x[,d[[4]]] is z2 -- endogonous time invariant vars
# z may contain excluded exogonous variables if there are any
# NB. intercept is automatically included
x <- as.matrix(cbind(x, 1))
Tx <- PQ(x, id)
d[[3]] <- c(d[[3]], dim(x)[2])
Z <- cbind(z, Tx$Ph[, d[[1]]], Tx$Qh[, d[[1]]], Tx$Qh[, d[[2]]],
x[, d[[ 3]]])
r <- tsls(x, Z, y, int = F)$resid
Ti <- table(id)
Ti.inv <- 1/table(id)
rdot2 <- tapply(r, id, mean)^2
v <- lm(rdot2~Ti.inv)
v <- v$coef
theta <- as.vector(sqrt(v[2]/(v[2] + v[1] * Ti[Tx$is])))
x <- x - (1 - theta) * Tx$Ph
y <- y - (1 - theta) * PQ(y, id)$Ph
fit <- tsls(x, Z, y, int = F)
list(fit=fit,v=v)
}

"PQ" <-
function(h, id)
{
if(is.vector(h))
h <- matrix(h, ncol = 1)
Ph <- unique(id)
Ph <- cbind(Ph, table(id))
for(i in 1:ncol(h))
Ph <- cbind(Ph, tapply(h[, i], id, mean))
is <- tapply(id, id)
Ph <- Ph[is, - (1:2)]
Qh <- h - Ph
list(Ph=as.matrix(Ph), Qh=as.matrix(Qh), is=is)
}

"tsls" <-
function(x, z, y, int = T)
{
# two stage least squares
if(int){
x <- cbind(1, x)
z <- cbind(1, z)
}
xhat <- lm(x~z-1)$fitted.values
R <- lm(y ~ xhat -1)
R$residuals <- c(y - x %*% R$coef)
return(R)
}

Next you should prepare your data, and then to run the htiv.R. Here is a code for that:

#a simple analysis of wages
d <- read.table("data.05",header=TRUE)
n <- length(d[,1])
h <- as.matrix(d)
h <- cbind(h[3:n,],h[3:n,]-h[2:(n-1),],h[2:(n-1),]-h[1:(n-2),])#difference
h <- h[h[,10]==0,]#ignore obs whose first diff confounds people
h <- h[h[,19]==0,]#ignore obs whose second diff confounds people
h <- cbind(h[,1:9],h[,7:9]-h[,16:18],h[,7:9]-h[,16:18]-h[,25:27])
h <- cbind(h[,1:4],h[,2]-h[,3],(h[,2]-h[,3])^2,h[,5:15])
dimnames(h)[[2]] <- c("id","yr","phd","sex","exp","exp^2","rphd","ru",
"y","Y","s","y <- 1","Y_1","s_1",
"y <- 2","Y_2","s_2")
#h <- h[h[,1]<124,]
#fit productivity equation by Hausman Taylor Method
y <- log(h[,9])
x <- h[,c(12,15,5:7,3,4)]
x[,1:2] <- log(x[,1:2])
#x[,5] <- 1/x[,5]#rank of phd program
x[,6] <- as.numeric(x[,6]>60) #vintage effect of phd
x[,5] <- 1/(x[,3]*x[,5])#rank of phd program
id <- h[,1]
dimnames(x)[[2]] <- c("y","y_1","exp","exp2","rphd","phd","sex")
vl <- list(3:4,c(1:2,5),6:7,NULL)
v <- htiv(x,y,id,vl)
print("these are the variance component estimates")
print(v$v)
print(summary(v$fit))

References:
Hausman, Jerry, 1978, "Specification Tests in Econometrics," Econometrica, 46, pp.1251-1271.
Hausman, Jerry, and William Taylor, 1981, "Panel Data and Unobservable Individual Effects", Econometrica, 49, No. 6, pp.1377-1398.
Koenker, Roger, 2004, "Panel Data," Lecture 13, mimeo, University of Illinois at Urbana-Champaign.

Last update: November 6, 2006