Welcome
to the thirteenth issue of e-Tutorial. Here I will apply the Hausman-Taylor
(1981) instrumental variables approach to the phuzics data of PS4. The
estimation strategy is explained in Koenker's Lecture 16 (2005, and the
respective routines to implement such strategies are given in both STATA
and R. I hope this helps in PS4. Have fun!!!
Downloading your data:
You can download your data
from the Econ 508 webpage (here)
and save the file in your preferred directory (I'll save mine as "C:\phuzics01.txt").
Note that the names of the variables are slightly different from the PS4.
In the data set:
id: person
identifier |
sex: gender (female==1) |
y: page equivalent in current
year |
yr: current
year - 1900 |
rphd: rank
of PhD |
Y: discounted
cumulative page equivalent |
phd: year of
PhD - 1900 |
ru: dummy for
research university(res.pos.==1) |
s: current annual salary |
In R:
See Appendix
A for a quick session in R.
In STATA:
First you need to expand
the memory dedicated to STATA in your computer:
set
memory 1g
Then you can infile the
data by typing:
infile
id yr phd sex rphd ru y Y
s using "C:\phuzics01.txt"
Note: Drop the first
line of obs with missing values (due to the labels of variables in .txt
file).
After that you declare
your data set as panel, sort by id, and summarize the data:
iis
id
tis
yr
sort
id
summarize
Finally you can save it
in the STATA format (I will save mine as "C:\phuzics01.dta"), and upload
it using a little STATA program you are going to write with your panel
functions.
PQ.do
The first step towards
the panel data estimation is to transform your data into group means and
deviations of group means. There's a specific code in STATA for that, called
PQ.do :
*
A simple program for computing group means (P) and
*
deviations from group means (Q).
capture
program drop PQ
program
define PQ
version
4.0
local options "Level(integer $S_level)"
local varlist "req ex"
parse "`*'"
parse "`varlist'",parse(" ")
sort id
quietly by id: gen P`1'=sum(`1')/sum(`1'~=.)
quietly by id: replace P`1'=P`1'[_N]
quietly gen Q`1'=`1'-P`1'
end
You can download the code
at the Econ 508 webpage (Routines
, PQ.do), and save
it. In STATA, go to "Files", "Do...", and select the PQ.do file you have
saved. As you open the file in STATA, it automatically runs the code. After
that you can use the function by typing "PQvariablename". For example,
if you type PQy, two tranformations of y
will be added to your list of variables:
Py
for the group means of y (used by the between estimators), and
Qy
for the deviations of group means of y (used by the within estimators).
You should apply this function
for all variables used in your estimations. For example, you will see that
the PQ routine will be used inside the program ht.do, to run the Hausman-Taylor
Instrumental Variables estimators.
ht.do
The second step is to write
your own program in order to compute the HTIV estimators. The Econ 508
webpage (Routines)
provides a base program for this, called
ht.do. You can download the file in the same way you did above. Some
details must be rexplained, though:
1) If you have'nt run PQ.do
until now, please do so. Otherwise the program ht.do will not work.
2) The program ht.do contains
some features that should be adjusted according to the user, such as the
path to access the data set, the directory where to create a log file,
etc. So, don't forget to adjust the program to your machine.
3) The most important detail:
the user should specify the model, create new variables, and decide which
variables will be included in the regression and/or treated as instruments.
Thus, it is essential to
read Koenker's Lecture 13 (2004) and Hausman-Taylor (1981), as well as
a good interpretation of the PS4 and auxiliar papers, in order to understand
what the program is doing and how you need to adjust it.
To facilitate your job,
I included below a sample of the ht.do program (with small adjustments)
to compute the productivity and the wages regressions:
The Productivity Equation
.
do "C:\produc01.do"
.
use "C:\phuzics01.dta", clear
.
iis id
.
tis yr
.
sort id
.
replace y=log(y)
(10623
real changes made)
.
gen exp=yr-phd
.
gen expsq=exp^2
.
gen ier=1/(exp*rphd)
.
gen d60=0
.
replace d60=1 if phd>60
(8788
real changes made)
.
quietly by id: gen y1=y[_n-1]
.
quietly by id: gen y2=y[_n-2]
.
drop if y2==.
(1000
observations deleted)
.
* Why can't we ommitt the above command?
.
* Stata can handle missing obs, but here we will
.
* have problems if we let Stata do our work. Why?
.
* Did you forget to run PQ.do before this program? If so, try again; otherwise,
go ahead.
.
PQ y
.
PQ y1
.
PQ y2
.
PQ exp
.
PQ expsq
.
PQ rphd
.
PQ ier
.
PQ d60
.
PQ sex
.
PQ ru
.
PQ Y
.
PQ s
.
* Note the effect of PQ in the time fixed variables. E.g.: Pd60=d60, Qd60=0,
Psex=sex, Qsex=0.
.
* Nonetheless, we need Pd60 and Psex later. Can you see where and why?
.
* POOLED OLS
.
regress y y1 y2 exp expsq ier d60 sex
Source | SS
df MS
Number of obs = 9623
---------+------------------------------
F( 7, 9615) = 3485.58
Model | 2482.67217 7 354.667452
Prob > F = 0.0000
Residual
| 978.354138 9615 .1017529
R-squared = 0.7173
---------+------------------------------
Adj R-squared = 0.7171
Total | 3461.02631 9622 .359699263
Root MSE = .31899
------------------------------------------------------------------------------
y | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
y1 | .7788283 .0099088
78.600 0.000 .759405
.7982516
y2 | -.2373115 .0095526 -24.843
0.000 -.2560365 -.2185865
exp | .0837589 .0021972
38.120 0.000 .0794519
.088066
expsq | -.0016344 .0000441 -37.102
0.000 -.0017207 -.001548
ier | 3.407355 .1403287
24.281 0.000 3.132281
3.682429
d60 | .0204149 .0098334
2.076 0.038 .0011395
.0396904
sex | -.0189182 .0093459 -2.024
0.043 -.0372382 -.0005982
_cons | .4432838 .0192268
23.056 0.000 .4055953
.4809723
------------------------------------------------------------------------------
.
* WITHIN ESTIMATORS (FIXED EFFECTS)
.
xtreg y y1 y2 exp expsq ier d60 sex, fe
Fixed-effects
(within) regression
Number of obs =
9623
Group
variable (i) : id
Number of groups = 500
R-sq:
within = 0.6535
Obs per group: min = 5
between = 0.7696
avg = 19.2
overall = 0.6594
max = 46
F(5,9118) =
3438.90
corr(u_i,
Xb) = 0.2409
Prob > F =
0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
y1 | .6573427 .0098663
66.625 0.000 .6380026
.6766828
y2 | -.3369351 .0094686 -35.585
0.000 -.3554957 -.3183745
exp | .101091 .0023692
42.669 0.000 .0964469
.1057351
expsq | -.0020004 .000046 -43.470
0.000 -.0020906 -.0019102
ier | .2556901 .2487721
1.028 0.304 -.231959
.7433392
d60 | (dropped)
sex | (dropped)
_cons | 1.037784 .0282838
36.692 0.000 .9823415
1.093227
------------------------------------------------------------------------------
sigma_u
| .20105547
sigma_e
| .3014717
rho | .30784989 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F
test that all u_i=0: F(499,9118) =
3.30
Prob > F = 0.0000
.
hausman, save
.
* BETWEEN ESTIMATORS
.
xtreg y y1 y2 exp expsq ier d60 sex, be
Between
regression (regression on group means) Number of obs
= 9623
Group
variable (i) : id
Number of groups = 500
R-sq:
within = 0.5435
Obs per group: min = 5
between = 0.9916
avg = 19.2
overall = 0.6503
max = 46
F(7,492) =
8303.54
sd(u_i
+ avg(e_i.))= .0323035
Prob > F =
0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
y1 | 1.49359 .0292156
51.123 0.000 1.436188
1.550993
y2 | -.519665 .0294238 -17.661
0.000 -.5774767 -.4618533
exp | .0045644 .0027111
1.684 0.093 -.0007622
.0098911
expsq | -.0001547 .0000683 -2.264
0.024 -.0002889 -.0000204
ier | .0081957 .0830353
0.099 0.921 -.1549517
.1713432
d60 | -.0079103 .0113073 -0.700
0.485 -.0301268 .0143062
sex | -.0039254 .0042214 -0.930
0.353 -.0122197 .0043688
_cons | .0810791 .0155373
5.218 0.000 .0505515
.1116067
------------------------------------------------------------------------------
.
* GLS ESTIMATORS (RANDOM EFFECTS):
.
xtreg y y1 y2 exp expsq ier d60 sex, re
Random-effects
GLS regression
Number of obs =
9623
Group
variable (i) : id
Number of groups = 500
R-sq:
within = 0.6324
Obs per group: min = 5
between = 0.9390
avg = 19.2
overall = 0.7173
max = 46
Random
effects u_i ~ Gaussian
Wald chi2(7) = 24399.03
corr(u_i,
X) = 0 (assumed)
Prob > chi2 =
0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err.
z P>|z| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
y1 | .7788283 .0099088
78.600 0.000 .7594074
.7982492
y2 | -.2373115 .0095526 -24.843
0.000 -.2560342 -.2185889
exp | .0837589 .0021972
38.120 0.000 .0794524
.0880654
expsq | -.0016344 .0000441 -37.102
0.000 -.0017207 -.001548
ier | 3.407355 .1403287
24.281 0.000 3.132316
3.682395
d60 | .0204149 .0098334
2.076 0.038 .0011419
.039688
sex | -.0189182 .0093459 -2.024
0.043 -.0372359 -.0006005
_cons | .4432838 .0192268
23.056 0.000 .4056001
.4809675
---------+--------------------------------------------------------------------
sigma_u
| 0
sigma_e
| .3014717
rho | 0
(fraction of variance due to u_i)
------------------------------------------------------------------------------
.
* HAUSMAN TEST: FIXED VS. RANDOM EFFECTS
.
hausman
---- Coefficients ----
| (b)
(B) (b-B)
sqrt(diag(V_b-V_B))
| Prior Current
Difference S.E.
---------+-------------------------------------------------------------
y1 | .6573427 .7788283
-.1214856
.
y2 | -.3369351 -.2373115
-.0996236
.
exp | .101091 .0837589
.017332 .0008861
expsq | -.0020004 -.0016344
-.000366 .0000133
ier | .2556901 3.407355
-3.151665 .2054152
---------+-------------------------------------------------------------
b = less efficient estimates obtained previously from xtreg.
B = fully efficient estimates obtained from xtreg.
Test:
Ho: difference in coefficients not systematic
chi2( 5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 2114.69
Prob>chi2 = 0.0000
.
* INSTRUMENTAL VARIABLES (1ST ROUND)
.
regress y y1 y2 exp expsq ier d60 sex (Pexp Qexp Pexpsq Qexpsq Qy1 Qy2
Qier Pd60 Psex)
Instrumental
variables (2SLS) regression
Source | SS
df MS
Number of obs = 9623
---------+------------------------------
F( 7, 9615) = 2063.13
Model | 2274.04125 7 324.863035
Prob > F = 0.0000
Residual
| 1186.98506 9615 .123451384
R-squared = 0.6570
---------+------------------------------
Adj R-squared = 0.6568
Total | 3461.02631 9622 .359699263
Root MSE = .35136
------------------------------------------------------------------------------
y | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
y1 | .6607854 .0114819
57.550 0.000 .6382785
.6832923
y2 | -.3351626 .0110306 -30.385
0.000 -.356785 -.3135403
exp | .1023073 .0027215
37.593 0.000 .0969726
.107642
expsq | -.0020375 .0000529 -38.501
0.000 -.0021412 -.0019338
ier | .3347733 .2895833
1.156 0.248 -.232871
.9024175
d60 | .0324825 .010838
2.997 0.003 .0112378
.0537272
sex | -.0198867 .010298
-1.931 0.053 -.0400729
.0002995
_cons | .9895473 .0332799
29.734 0.000 .9243117
1.054783
------------------------------------------------------------------------------
.
predict r,res
.
PQ r
.
gen Prsq=Pr^2
.
quietly by id: gen mark=_n
.
*What does mark do? (see next regression)
.
quietly by id: gen T=_N
.
gen iT=1/T
.
regress Prsq iT if mark==1
Source | SS
df MS
Number of obs = 500
---------+------------------------------
F( 1, 498) = 4.15
Model | .011654194 1 .011654194
Prob > F = 0.0420
Residual
| 1.39687301 498 .002804966
R-squared = 0.0083
---------+------------------------------
Adj R-squared = 0.0063
Total | 1.4085272 499 .0028227
Root MSE = .05296
------------------------------------------------------------------------------
Prsq | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
iT | .1525195 .0748252
2.038 0.042 .0055075
.2995315
_cons | .0291845 .0053924
5.412 0.000 .0185899
.0397791
------------------------------------------------------------------------------
.
matrix b=get(_b)
.
gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T))
.
*Now you need to transform the variables included in your model
.
replace y=y-(1-theta)*Py
(9623
real changes made)
.
replace y1=y1-(1-theta)*Py1
(9623
real changes made)
.
replace y2=y2-(1-theta)*Py2
(9623
real changes made)
.
replace exp=exp-(1-theta)*Pexp
(9623
real changes made)
.
replace expsq=expsq-(1-theta)*Pexpsq
(9623
real changes made)
.
replace ier=ier-(1-theta)*Pier
(9623
real changes made)
.
replace d60=d60-(1-theta)*Pd60
(7874
real changes made)
.
replace sex=sex-(1-theta)*Psex
(1358
real changes made)
.
* INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION)
.
regress y y1 y2 exp expsq ier d60 sex theta (Qy1 Qy2 Qier Pexp Qexp Pexpsq
Qexpsq Pd60 Psex theta), noconstant
Instrumental
variables (2SLS) regression
Source | SS
df MS
Number of obs = 9623
---------+------------------------------
F( 8, 9615) = .
Model | 18502.9839 8 2312.87299
Prob > F =
.
Residual
| 907.099508 9615 .094342123
R-squared =
.
---------+------------------------------
Adj R-squared = .
Total | 19410.0834 9623 2.01705117
Root MSE = .30715
------------------------------------------------------------------------------
y | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
y1 | .6578081 .01005
65.454 0.000 .638108
.6775082
y2 | -.3367606 .0096466 -34.910
0.000 -.3556698 -.3178513
exp | .1012754 .0024055
42.101 0.000
.09656 .1059907
expsq | -.0020057 .0000468 -42.898
0.000 -.0020974 -.0019141
ier | .2589409 .2534411
1.022 0.307 -.237857
.7557388
d60 | .033342 .023551
1.416 0.157 -.0128229
.079507
sex | -.01201 .0203523
-0.590 0.555 -.0519049
.0278848
theta | 1.009543 .0361539
27.924 0.000 .9386737
1.080412
------------------------------------------------------------------------------
.
*Why do we have theta as a variable and no intercept here?
.
matrix list b
b[1,2]
iT _cons
y1
.15251948 .02918446
.
summarize theta
Variable
| Obs
Mean Std. Dev. Min
Max
---------+-----------------------------------------------------
theta | 9623 .4463058 .0806179
.3194048 .7148795
.
clear
end
of do-file
The Wages Equation
.
do "C:\wages01.do"
.
use "C:\phuzics01.dta", clear
.
iis id
.
tis yr
.
sort id
.
quietly by id: gen s1=s[_n-1]
.
quietly by id: gen Y1=Y[_n-1]
.
*Shall I drop the missing variables here?
.
drop if s1==.
(500
observations deleted)
.
replace s=log(s/s1)
(10123
real changes made)
.
replace Y=log(Y/Y1)
(10123
real changes made)
.
PQ s
.
PQ Y
.
PQ sex
.
PQ ru
.
* POOLED OLS
.
regress s Y ru sex
Source | SS
df MS
Number of obs = 10123
---------+------------------------------
F( 3, 10119) = 3335.56
Model | 7.67338445 3 2.55779482
Prob > F = 0.0000
Residual
| 7.75952035 10119 .000766827
R-squared = 0.4972
---------+------------------------------
Adj R-squared = 0.4971
Total | 15.4329048 10122 .001524689
Root MSE = .02769
------------------------------------------------------------------------------
s | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
Y | .1795768 .0019558 91.816
0.000 .175743
.1834107
ru | .0142779 .0006994
20.414 0.000 .012907
.0156489
sex | -.0082762 .0007914 -10.458
0.000 -.0098275 -.0067249
_cons | -.0026954 .0004037 -6.676
0.000 -.0034869 -.001904
------------------------------------------------------------------------------
.
* WITHIN ESTIMATORS (FIXED EFFECTS)
.
xtreg s Y ru sex, fe
Fixed-effects
(within) regression
Number of obs = 10123
Group
variable (i) : id
Number of groups = 500
R-sq:
within = 0.4614
Obs per group: min = 6
between = 0.6455
avg = 20.2
overall = 0.4855
max = 47
F(2,9621) =
4120.97
corr(u_i,
Xb) = 0.0939
Prob > F =
0.0000
------------------------------------------------------------------------------
s | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
Y | .1793187 .0021088 85.035
0.000 .1751851 .1834523
ru | .0064123 .001264
5.073 0.000 .0039346
.0088899
sex | (dropped)
_cons | -.0022472 .0004157 -5.406
0.000 -.0030619 -.0014324
------------------------------------------------------------------------------
sigma_u
| .01011922
sigma_e
| .02691058
rho | .12388261 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F
test that all u_i=0: F(499,9621) =
2.19
Prob > F = 0.0000
.
hausman, save
.
* BETWEEN ESTIMATORS
.
xtreg s Y ru sex, be
Between
regression (regression on group means) Number of obs
= 10123
Group
variable (i) : id
Number of groups = 500
R-sq:
within = 0.4591
Obs per group: min = 6
between = 0.7096
avg = 20.2
overall = 0.4971
max = 47
F(3,496) =
404.06
sd(u_i
+ avg(e_i.))= .0088949
Prob > F =
0.0000
------------------------------------------------------------------------------
s | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
Y | .2020762 .007132
28.334 0.000 .1880636
.2160888
ru | .0167146 .0011768
14.203 0.000 .0144025
.0190268
sex | -.0080017 .0011612 -6.891
0.000 -.0102832 -.0057203
_cons | -.0069383 .0011596 -5.983
0.000 -.0092167 -.00466
------------------------------------------------------------------------------
.
* GLS ESTIMATORS (RANDOM EFFECTS):
.
xtreg s Y ru sex, re
Random-effects
GLS regression
Number of obs = 10123
Group
variable (i) : id
Number of groups = 500
R-sq:
within = 0.4602
Obs per group: min = 6
between = 0.7063
avg = 20.2
overall = 0.4969
max = 47
Random
effects u_i ~ Gaussian
Wald chi2(3) = 9324.37
corr(u_i,
X) = 0 (assumed)
Prob > chi2 =
0.0000
------------------------------------------------------------------------------
s | Coef. Std. Err.
z P>|z| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
Y | .1785767 .0019929 89.608
0.000 .1746707 .1824826
ru | .0123244 .0008565
14.390 0.000 .0106458
.014003
sex | -.0081517 .001123
-7.259 0.000 -.0103527
-.0059508
_cons | -.0022302 .0005162 -4.320
0.000 -.0032419 -.0012184
---------+--------------------------------------------------------------------
sigma_u
| .00597206
sigma_e
| .02691058
rho | .04693787 (fraction of variance due to u_i)
------------------------------------------------------------------------------
.
* HAUSMAN TEST: FIXED VS. RANDOM EFFECTS
.
hausman
---- Coefficients ----
| (b)
(B) (b-B)
sqrt(diag(V_b-V_B))
| Prior Current
Difference S.E.
---------+-------------------------------------------------------------
Y | .1793187 .1785767
.0007421 .0006894
ru | .0064123 .0123244
-.0059121 .0009296
---------+-------------------------------------------------------------
b = less efficient estimates obtained previously from xtreg.
B = fully efficient estimates obtained from xtreg.
Test:
Ho: difference in coefficients not systematic
chi2( 2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 56.15
Prob>chi2 = 0.0000
.
* INSTRUMENTAL VARIABLES (1ST ROUND)
.
regress s Y ru sex (PY QY Qru Psex)
Instrumental
variables (2SLS) regression
Source | SS
df MS
Number of obs = 10123
---------+------------------------------
F( 3, 10119) = 3168.81
Model | 7.58349777 3 2.52783259
Prob > F = 0.0000
Residual
| 7.84940703 10119 .00077571
R-squared = 0.4914
---------+------------------------------
Adj R-squared = 0.4912
Total | 15.4329048 10122 .001524689
Root MSE = .02785
------------------------------------------------------------------------------
s | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
Y | .1835144 .0020489 89.569
0.000 .1794983 .1875306
ru | .0067056 .0013071
5.130 0.000 .0041434
.0092678
sex | -.0084882 .0007966 -10.656
0.000 -.0100496 -.0069268
_cons | -.0016707 .0004326 -3.862
0.000 -.0025186 -.0008227
------------------------------------------------------------------------------
.
predict r,res
.
PQ r
.
gen Prsq=Pr^2
.
quietly by id: gen mark=_n
.
*What does mark do? (see next regression)
.
quietly by id: gen T=_N
.
gen iT=1/T
.
regress Prsq iT if mark==1
Source | SS
df MS
Number of obs = 500
---------+------------------------------
F( 1, 498) = 4.46
Model | 7.4757e-08 1 7.4757e-08
Prob > F = 0.0352
Residual
| 8.3462e-06 498 1.6759e-08
R-squared = 0.0089
---------+------------------------------
Adj R-squared = 0.0069
Total | 8.4209e-06 499 1.6876e-08
Root MSE = .00013
------------------------------------------------------------------------------
Prsq | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
iT | .0004526 .0002143
2.112 0.035 .0000316
.0008736
_cons | .0000653 .0000141
4.630 0.000 .0000376
.000093
------------------------------------------------------------------------------
.
matrix b=get(_b)
.
gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T))
.
*Now you need to transform the variables included in your model
.
replace s=s-(1-theta)*Ps
(10078
real changes made)
.
replace Y=Y-(1-theta)*PY
(10123
real changes made)
.
replace sex=sex-(1-theta)*Psex
(1426
real changes made)
.
replace ru=ru-(1-theta)*Pru
(6318
real changes made)
.
* INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION)
.
regress s Y ru sex theta (PY QY Qru Psex theta), noconstant
Instrumental
variables (2SLS) regression
Source | SS
df MS
Number of obs = 10123
---------+------------------------------
F( 4, 10119) = .
Model | 8.00935131 4 2.00233783
Prob > F =
.
Residual
| 7.19637615 10119 .000711175
R-squared =
.
---------+------------------------------
Adj R-squared = .
Total | 15.2057275 10123 .001502097
Root MSE = .02667
------------------------------------------------------------------------------
s | Coef. Std. Err.
t P>|t| [95%
Conf. Interval]
---------+--------------------------------------------------------------------
Y | .1809964 .0020571 87.988
0.000 .1769642 .1850287
ru | .0065199 .0012524
5.206 0.000 .004065
.0089748
sex | -.0080601 .0015735 -5.122
0.000 -.0111445 -.0049757
theta | -.0014172 .00067
-2.115 0.034 -.0027306
-.0001039
------------------------------------------------------------------------------
.
*Why do we have theta as a variable and no intercept here?
.
matrix list b
b[1,2]
iT _cons
y1
.00045259 .0000653
.
summarize theta
Variable
| Obs
Mean Std. Dev. Min
Max
---------+-----------------------------------------------------
theta | 10123 .4892002 .0797415
.3584998 .7321285
.
clear
end
of do-file
Appendix
A: Hausman-Taylor Instrumental Variables in R
Here you can reproduce the
results above using R. You should start running the functions PQ.R,
tsls.R, and htiv.R available at the Econ
508 webpage (Routines,
panel.R).:
"htiv"
<-
function(x, y, id, d, z = NULL)
{
#input:
# x design matrix partitioned as given in d
# y response vector
# id strata indicator
# d list of column numbers indicating partitioning of x
# x[,d[[1]]] is x1 -- exogonous time varying vars
# x[,d[[2]]] is x2 -- endogonous time varying vars
# x[,d[[3]]] is z1 -- exogonous time invariant vars
# x[,d[[4]]] is z2 -- endogonous time invariant vars
# z may contain excluded exogonous variables if there are any
# NB. intercept is automatically included
x <- as.matrix(cbind(x, 1))
Tx <- PQ(x, id)
d[[3]] <- c(d[[3]], dim(x)[2])
Z <- cbind(z, Tx$Ph[, d[[1]]], Tx$Qh[, d[[1]]], Tx$Qh[, d[[2]]],
x[, d[[ 3]]])
r <- tsls(x, Z, y, int = F)$resid
Ti <- table(id)
Ti.inv <- 1/table(id)
rdot2 <- tapply(r, id, mean)^2
v <- lm(rdot2~Ti.inv)
v <- v$coef
theta <- as.vector(sqrt(v[2]/(v[2] + v[1] * Ti[Tx$is])))
x <- x - (1 - theta) * Tx$Ph
y <- y - (1 - theta) * PQ(y, id)$Ph
fit <- tsls(x, Z, y, int = F)
list(fit=fit,v=v)
}
"PQ"
<-
function(h, id)
{
if(is.vector(h))
h <- matrix(h, ncol = 1)
Ph <- unique(id)
Ph <- cbind(Ph, table(id))
for(i in 1:ncol(h))
Ph <- cbind(Ph, tapply(h[, i], id, mean))
is <- tapply(id, id)
Ph <- Ph[is, - (1:2)]
Qh <- h - Ph
list(Ph=as.matrix(Ph), Qh=as.matrix(Qh), is=is)
}
"tsls"
<-
function(x, z, y, int = T)
{
# two stage least squares
if(int){
x <- cbind(1, x)
z <- cbind(1, z)
}
xhat <- lm(x~z-1)$fitted.values
R <- lm(y ~ xhat -1)
R$residuals <- c(y - x %*% R$coef)
return(R)
}
Next you should prepare
your data, and then to run the htiv.R. Here is a code for that:
#a
simple analysis of wages
d <- read.table("data.05",header=TRUE)
n <- length(d[,1])
h <- as.matrix(d)
h <- cbind(h[3:n,],h[3:n,]-h[2:(n-1),],h[2:(n-1),]-h[1:(n-2),])#difference
h <- h[h[,10]==0,]#ignore obs whose first diff confounds people
h <- h[h[,19]==0,]#ignore obs whose second diff confounds people
h <- cbind(h[,1:9],h[,7:9]-h[,16:18],h[,7:9]-h[,16:18]-h[,25:27])
h <- cbind(h[,1:4],h[,2]-h[,3],(h[,2]-h[,3])^2,h[,5:15])
dimnames(h)[[2]] <- c("id","yr","phd","sex","exp","exp^2","rphd","ru",
"y","Y","s","y <- 1","Y_1","s_1",
"y <- 2","Y_2","s_2")
#h <- h[h[,1]<124,]
#fit productivity equation by Hausman Taylor Method
y <- log(h[,9])
x <- h[,c(12,15,5:7,3,4)]
x[,1:2] <- log(x[,1:2])
#x[,5] <- 1/x[,5]#rank of phd program
x[,6] <- as.numeric(x[,6]>60) #vintage effect of phd
x[,5] <- 1/(x[,3]*x[,5])#rank of phd program
id <- h[,1]
dimnames(x)[[2]] <- c("y","y_1","exp","exp2","rphd","phd","sex")
vl <- list(3:4,c(1:2,5),6:7,NULL)
v <- htiv(x,y,id,vl)
print("these are the variance component estimates")
print(v$v)
print(summary(v$fit))
References:
Hausman, Jerry, 1978, "Specification
Tests in Econometrics," Econometrica, 46, pp.1251-1271.
Hausman, Jerry, and William
Taylor, 1981, "Panel Data and Unobservable Individual Effects", Econometrica,
49, No. 6, pp.1377-1398.
Koenker, Roger, 2004, "Panel
Data," Lecture 13, mimeo, University of Illinois at Urbana-Champaign.
|