logo

Applied Econometrics 
Econ 508 - Fall 2014

Professor: Roger Koenker 

TA: Nicolas Bottan 

Welcome to a new issue of e-Tutorial. Here we will apply Hausman-Taylor (1981) instrumental variables approach to the phuzics data of Problem Set 4. The estimation strategy is explained in Prof. Koenker’s Lecture Note 17. 1

Data

The first thing you need is to download the phuzics panel data set, called phuzics10.txt from the Econ 508 web site. Save it in your preferred directory.

The next step is loading the Data in Stata. After defining your working directory (where you have your data saved):

  infile  id  yr  phd  sex  rphd  ru  y  Y  s  using  "phuzics10.txt", clear

Note: You should drop the first line of obs with missing values (due to the labels of variables in .txt file). Next you should declare the data a panel data set:

drop if id==.
xtset id yr
Finally you can save it in the STATA format (I will save mine as "phuzics10.dta"), and upload it using a little STATA program you are going to write with your panel functions.


PQ.do

The first step towards the panel data estimation is to transform your data into group means and deviations of group means. There's a specific code in STATA for that, called PQ.do :
* deviations from group means (Q). 
capture program drop PQ
program define PQ
version 4.0
local options "Level(integer $S_level)"
local varlist "req ex"
parse "`*'"
parse "`varlist'",parse(" ")
sort id
quietly by id:
gen P`1'=sum(`1')/sum(`1'~=.)
quietly by id: replace P`1'=P`1'[_N]
quietly
gen Q`1'=`1'-P`1'
end
You can download the code at the Econ 508 webpage (Routines , PQ.do), and save it. In STATA, go to "Files", "Do...", and select the PQ.do file you have saved. As you open the file in STATA, it automatically runs the code. After that you can use the function by typing "PQvariablename". For example, if you type PQy, two tranformations of y will be added to your list of variables:
   PQ y 
This will generate two variables:
Py  for the group means of y (used by the between estimators), and
Qy  for the deviations of group means of y (used by the within estimators).

You should apply this function for all variables used in your estimations. For example, you will see that the PQ routine will be used inside the program ht.do, to run the Hausman-Taylor Instrumental Variables estimators.


Estimating Phuzicists Productivity

In Problem Set 4 you are asked to explore “the phuzical revolution”. We will use this setting to see Hausman and Taylor’s approach at work. The model suggested in the Hints of the problem set is:

\[ log y_{it}= \Sigma_{s-1} ^{q} \rho log y_{it-s}+ f(t,t_{0i},t-t_0i,r_{i}) + u_{it} \]

so a working model may take the form:

\[ log y_{it} = \beta_0 + \Sigma_{s=1} ^2 \rho_i \log y_{it-s} + \beta_1e_{it} + \beta_2e^2_{i,t} + \beta_3 \frac{1}{e_{it} \times r_{i}} +\beta_4 gender_{i} + \beta_5 d80_i + \alpha_{it} + u_{it} \]


In order to compute the HTIV estimators we will write our own program: ht.do. The Econ 508 webpage (Routines) provides a base program for this, called ht.do. You can download the file in the same way you did above. Some details must be rexplained, though:

1) If you have'nt run PQ.do until now, please do so. Otherwise the program ht.do will not work.

2) The program ht.do contains some features that should be adjusted according to the user, such as the path to access the data set, the directory where to create a log file, etc. So, don't forget to adjust the program to your machine.

3) The most important detail: the user should specify the model, create new variables, and decide which variables will be included in the regression and/or treated as instruments.

Thus, it is essential to read Professor Koenker's Lecture notes and Hausman-Taylor (1981), as well as a good interpretation of the PS4 and auxiliar papers, in order to understand what the program is doing and how you need to adjust it.

To make the task easier, here is a sample of the ht.do program (with small adjustments) to compute the productivity and the wages regressions:
   use "phuzics10.dta", clear 
xtset id yr

* Prepare variables of interest
replace y=log(y)
gen exp=yr-phd
gen expsq=exp^2
gen ier=1/(exp*rphd)
gen d60=0
replace d60=1 if phd>60
gen y1=l.y
gen y2=l2.y
* Drop observations for first two periods since they have no lagged values
drop if y2==.

* Did you forget to run PQ.do before this program? If so, try again; otherwise, go ahead.
foreach var in y y1 y2 exp expsq rphd ier d60 sex ru Y s {
PQ `var'
}
* Note the effect of PQ in the time fixed variables. E.g.: Pd60=d60, Qd60=0, Psex=sex, Qsex=0.
* Nonetheless, we need Pd60 and Psex later. Can you see where and why?

* POOLED OLS
reg y y1 y2 exp expsq ier d60 sex
 
Source | SS df MS Number of obs = 5448
-------------+------------------------------ F( 7, 5440) = 854.00
Model | 1318.75786 7 188.39398 Prob > F = 0.0000
Residual | 1200.07751 5440 .220602484 R-squared = 0.5236
-------------+------------------------------ Adj R-squared = 0.5229
Total | 2518.83537 5447 .462426175 Root MSE = .46968

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 | .6604158 .0123983 53.27 0.000 .6361102 .6847213
y2 | -.3465397 .0119992 -28.88 0.000 -.370063 -.3230164
exp | .1184657 .004295 27.58 0.000 .1100459 .1268855
expsq | -.0024504 .0001085 -22.58 0.000 -.0026631 -.0022377
ier | 1.28344 .1907015 6.73 0.000 .9095886 1.657291
d60 | .0011941 .0472823 0.03 0.980 -.0914981 .0938862
sex | -.0091688 .0202146 -0.45 0.650 -.0487974 .0304599
_cons | 1.030669 .0567545 18.16 0.000 .9194075 1.14193
------------------------------------------------------------------------------
   * WITHIN ESTIMATORS (FIXED EFFECTS) 
xtreg y y1 y2 exp expsq ier d60 sex, fe
note: d60 omitted because of collinearity
note: sex omitted because of collinearity

Fixed-effects (within) regression Number of obs = 5448
Group variable: id Number of groups = 485

R-sq: within = 0.4862 Obs per group: min = 1
between = 0.6140 avg = 11.2
overall = 0.5021 max = 45

F(5,4958) = 938.17
corr(u_i, Xb) = 0.0651 Prob > F = 0.0000

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 | .543241 .0126155 43.06 0.000 .5185092 .5679729
y2 | -.4260858 .0120409 -35.39 0.000 -.4496912 -.4024803
exp | .155017 .0059574 26.02 0.000 .1433378 .1666961
expsq | -.0031535 .0001339 -23.56 0.000 -.0034159 -.0028911
ier | 2.183352 .4471742 4.88 0.000 1.306693 3.060011
d60 | 0 (omitted)
sex | 0 (omitted)
_cons | 1.25111 .0688997 18.16 0.000 1.116036 1.386184
-------------+----------------------------------------------------------------
sigma_u | .24755414
sigma_e | .44932935
rho | .23285611 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(484, 4958) = 3.84 Prob > F = 0.0000
   estimates store fe 
* BETWEEN ESTIMATORS
xtreg y y1 y2 exp expsq ier d60 sex, be
Between regression (regression on group means)  Number of obs      =      5448
Group variable: id Number of groups = 485

R-sq: within = 0.3948 Obs per group: min = 1
between = 0.8616 avg = 11.2
overall = 0.4800 max = 45

F(7,477) = 424.37
sd(u_i + avg(e_i.))= .1462628 Prob > F = 0.0000

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 | 1.135907 .0344792 32.94 0.000 1.068157 1.203657
y2 | -.394536 .0333223 -11.84 0.000 -.4600127 -.3290593
exp | .0762504 .0095279 8.00 0.000 .0575286 .0949723
expsq | -.0019336 .000312 -6.20 0.000 -.0025466 -.0013206
ier | .0411571 .1858239 0.22 0.825 -.3239776 .4062917
d60 | -.1681255 .1154429 -1.46 0.146 -.394965 .058714
sex | .009449 .0199837 0.47 0.637 -.0298179 .0487159
_cons | .3928153 .1065342 3.69 0.000 .1834808 .6021497
------------------------------------------------------------------------------
   estimates store be
* GLS ESTIMATORS (RANDOM EFFECTS):
xtreg y y1 y2 exp expsq ier d60 sex, re
Random-effects GLS regression                   Number of obs      =      5448
Group variable: id Number of groups = 485

R-sq: within = 0.4679 Obs per group: min = 1
between = 0.7652 avg = 11.2
overall = 0.5236 max = 45

Wald chi2(7) = 5977.98
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 | .6604158 .0123983 53.27 0.000 .6361156 .6847159
y2 | -.3465397 .0119992 -28.88 0.000 -.3700578 -.3230216
exp | .1184657 .004295 27.58 0.000 .1100477 .1268837
expsq | -.0024504 .0001085 -22.58 0.000 -.0026631 -.0022377
ier | 1.28344 .1907015 6.73 0.000 .9096718 1.657208
d60 | .0011941 .0472823 0.03 0.980 -.0914775 .0938656
sex | -.0091688 .0202146 -0.45 0.650 -.0487886 .0304511
_cons | 1.030669 .0567545 18.16 0.000 .9194322 1.141906
-------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | .44932935
rho | 0 (fraction of variance due to u_i)
------------------------------------------------------------------------------
   estimates store re
* HAUSMAN TEST: FIXED VS. RANDOM EFFECTS
hausman fe re
Instrumental variables
   * INSTRUMENTAL VARIABLES (1ST ROUND) 
ivreg y (y1 y2 exp expsq ier d60 sex = Pexp Qexp Pexpsq Qexpsq Qy1 Qy2 Qier Pd60 Psex)

Instrumental variables (2SLS) regression

Source | SS df MS Number of obs = 5448
-------------+------------------------------ F( 7, 5440) = 703.28
Model | 1264.03996 7 180.577137 Prob > F = 0.0000
Residual | 1254.79541 5440 .230660922 R-squared = 0.5018
-------------+------------------------------ Adj R-squared = 0.5012
Total | 2518.83537 5447 .462426175 Root MSE = .48027

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 | .5435413 .0134808 40.32 0.000 .5171137 .569969
y2 | -.4260609 .0128697 -33.11 0.000 -.4512907 -.4008311
exp | .1533415 .0061232 25.04 0.000 .1413375 .1653454
expsq | -.0031088 .0001361 -22.84 0.000 -.0033757 -.002842
ier | 2.151011 .4759039 4.52 0.000 1.218049 3.083973
d60 | .0072021 .0487178 0.15 0.882 -.0883042 .1027085
sex | -.0116087 .0206865 -0.56 0.575 -.0521625 .028945
_cons | 1.258208 .0790441 15.92 0.000 1.10325 1.413167
------------------------------------------------------------------------------
Instrumented: y1 y2 exp expsq ier d60 sex
Instruments: Pexp Qexp Pexpsq Qexpsq Qy1 Qy2 Qier Pd60 Psex
------------------------------------------------------------------------------
   predict r,res 
PQ r
gen Prsq=Pr^2
quietly bys id:
gen mark=_n

*What does mark do? (see next regression)
quietly by id:
gen T=_N
gen iT=1/T
reg Prsq iT if mark==1

Source | SS df MS Number of obs = 485
-------------+------------------------------ F( 1, 483) = 41.37
Model | .358413506 1 .358413506 Prob > F = 0.0000
Residual | 4.18495546 483 .008664504 R-squared = 0.0789
-------------+------------------------------ Adj R-squared = 0.0770
Total | 4.54336897 484 .009387126 Root MSE = .09308

------------------------------------------------------------------------------
Prsq | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
iT | .1329607 .020673 6.43 0.000 .0923406 .1735808
_cons | .0375626 .0055915 6.72 0.000 .0265759 .0485494
------------------------------------------------------------------------------
   matrix b=get(_b) 
gen theta=sqrt(_b[iT]/(_b[iT]+_b[_cons]*T))

*Now you need to transform the variables included in your model
replace y=y-(1-theta)*Py
replace y1=y1-(1-theta)*Py1
replace y2=y2-(1-theta)*Py2
replace exp=exp-(1-theta)*Pexp
replace expsq=expsq-(1-theta)*Pexpsq
replace ier=ier-(1-theta)*Pier
replace d60=d60-(1-theta)*Pd60
replace sex=sex-(1-theta)*Psex

* INSTRUMENTAL VARIABLES (AFTER THETA CORRECTION)
ivreg y (y1 y2 exp expsq ier d60 sex theta = Qy1 Qy2 Qier Pexp Qexp Pexpsq Qexpsq Pd60 Psex theta), noconstant

Instrumental variables (2SLS) regression

Source | SS df MS Number of obs = 5448
-------------+------------------------------ F( 8, 5440) = .
Model | 11578.2685 8 1447.28356 Prob > F = .
Residual | 1064.92083 5440 .195757505 R-squared = .
-------------+------------------------------ Adj R-squared = .
Total | 12643.1893 5448 2.32070288 Root MSE = .44244

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 | .5432826 .0124213 43.74 0.000 .518932 .5676332
y2 | -.4260937 .0118562 -35.94 0.000 -.4493365 -.4028509
exp | .1548236 .0058089 26.65 0.000 .1434359 .1662114
expsq | -.0031484 .0001305 -24.13 0.000 -.0034042 -.0028926
ier | 2.18223 .4399483 4.96 0.000 1.319756 3.044705
d60 | .0182557 .1448914 0.13 0.900 -.2657896 .3023009
sex | -.0125364 .0403387 -0.31 0.756 -.0916165 .0665436
theta | 1.236176 .1561126 7.92 0.000 .9301332 1.54222
------------------------------------------------------------------------------
Instrumented: y1 y2 exp expsq ier d60 sex theta
Instruments: Qy1 Qy2 Qier Pexp Qexp Pexpsq Qexpsq Pd60 Psex theta
------------------------------------------------------------------------------
Why do we have theta as a variable and no intercept here?
   matrix list b
sum theta
.    matrix list b

b[1,2]
iT _cons
y1 .13296069 .03756264

. summarize theta

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
theta | 5448 .4519689 .1071866 .2700443 .8830183


References:

Hausman, Jerry, 1978, “Specification Tests in Econometrics,” Econometrica, 46, pp.1251-1271. Hausman, Jerry, and William Taylor, 1981, “Panel Data and Unobservable Individual Effects”, Econometrica, 49, No. 6, pp.1377-1398. Koenker, Roger, 2014, “Panel Data,” Lecture 17, mimeo, University of Illinois at Urbana-Champaign.


  1. Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu