|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Applied
Econometrics
Econ 508 - Fall 2007 e-Tutorial 12: Panel Data I - Basics |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Welcome
to the twelfth issue of e-Tutorial. Here I will talk about the basic fundamentals
of panel data estimation techniques: from the organization of your panel
data sets to the tests of fixed effects versus random effects. In the example
below I will use the theoretical background of Prof. Koenker's Lecture
Note 13 (2004) to reproduce the results of Greene (1997). I insert STATA
estimation techniques (plus some comments) whenever necessary. I also provide
a short introduction to panel data in R. Have fun!!!
Example:
Stacking your data:
Save the data in your preferred path (I will save mine as "C:/econ508/greene.txt") and open your preferred software. In R:
In STATA:
infile Year Firm Cost Output D1 D2 D3 D4 D5 D6 using "C:/econ508/greene14.txt" Drop the first line of observations containing missing values (due to the labels of variables in the text file). The next step is to generate
the log values of costs and outputs:
Finally you declare your
data set as panel:
Theoretical Background: Consider a simplified version of the equation (1) in Koenker's Lecture 13: (1) yit = xitb + ai + uit a) Pooled OLS: The most basic estimator of panel data sets are the Pooled OLS (POLS). Johnston & DiNardo (1997) recall that the POLS estimators ignore the panel structure of the data, treat observations as being serially uncorrelated for a given individual, with homoscedastic errors across individuals and time periods: (2) bPOLS = (X'X)-1X'y In STATA, you can obtain
the POLS as follows:
Source | SS
df MS
Number of obs = 24
------------------------------------------------------------------------------
scalar
R2OLS=_result(7)
b) Fixed Effects (Within-Groups) Estimators: In Koenker's Lecture 13
we examined the effects of applying the matrix P and Q to the data:
The within-groups (or fixed effects) estimator is then given by: (3) bW = (X'QX)-1X'Qy Given that Q is idempotent, this is equivalent to regressing Qy on QX, i.e., using data in the form of deviations from individuals means. In STATA, you can obtain the within-groups estimators using the built-in functionxtreg, fe: xtreg
lnc lny, fe
R-sq:
within = 0.8774
Obs per group: min = 4
F(1,17)
= 121.66
------------------------------------------------------------------------------
matrix
bW=get(_b)
Note: The intercept
above shown is an average of individual intercepts. If you are interested
in obtaining firm-specific intercepts, go to Appendix
B.
Between-Groups Estimators:
(4)
bB
= [X'PX]-1X'Py
In STATA, you can obtain the between-groups estimators using the built-in function xtreg, be: xtreg
lnc lny, be
R-sq:
within = 0.8774
Obs per group: min = 4
F(1,4)
= 236.23
------------------------------------------------------------------------------
matrix
bB=get(_b)
c) Random Effects: Following Koenker's Lecture 13, consider ai's as random. So, the model will be estimated via GLS: (5) bGLS = [X'Omega-1X]-1X'Omega-1y where Omega = (sigmau2*InT + T*sigmaa2*P) You can obtain GLS estimators in STATA by using the built-in functionxtreg, re: xtreg
lnc lny, re
R-sq:
within = 0.8774
Obs per group: min = 4
Random
effects u_i ~ Gaussian
Wald chi2(1) = 268.10
------------------------------------------------------------------------------
GLS as a Combination
of Within- and Between-Groups Estimators:
(5.a) bGLS = Delta* bB + (1-Delta)* bW where Delta = VW / (VW + VB) In STATA, you can recover random effects GLS estimators as follows: matrix
V=VW+VB
What should I use: Fixed Effects or Random Effects? A Hausman (1978) Test Approach Hausman (1978) suggested a test to check whether the individual effects (ai) are correlated with the regressors (Xit): - Under the Null Hypothesis: Orthogonality, i.e., no correlation between individual effects and explanatory variables. Both random effects and fixed effects estimators are consistent, but the random effects estimator is efficient, while fixed effects is not. - Under the Alternative Hypothesis: Individual effects are correlated with the X's. In this case, random effects estimator is inconsistent, while fixed effects estimator is consistent and efficient. Greene (1997) recalls that, under the null, the estimates should not differ systematically. Thus, the test will be based on a contrast vecor H: (6) H = [bGLS - bW]'[V(bW)-V(bGLS)]-1[bGLS - bW] ~ Chi-squared (k) where k is the number of regressors in X (excluding constant). In STATA, you can obtain that as follows: xtreg
lnc lny, fe
---- Coefficients ----
Test: Ho: difference in coefficients not systematic
chi2( 1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
So, based on the test above,
we can see that the tests statistic (10.86) is greater than the critical
value of a Chi-squared (1df, 5%) = 3.84. Therefore, we reject the null
hypothesis. Given such result, the preferred model is the fixed effects.
Appendix A: Quick Session in R The first thing to do is to download the data, save in your preferred directory (I will save mine as C:/econ508/greene14.txt), and infile the data into R: greene14<-read.table("C:/econ472/greene14.txt",
header=T)
Next you need to extract each variable from the data set: year<-greene14$Year
And transform them into logs (usually you don't need to, but it will facilitate the use of panel functions later). lnc<-log(cost)
Finally, you will call the library MASS, to use the vcov function. library(MASS)
Pooled OLS
Fixed Effects:
#Start copying here: #This
function computes matrices of means and deviations from means
#Finish copying here. Next, you will extract the
between and the within data:
#Fixed
Effects (Within Estimators):
#Between
Estimator:
Appendix B: Recovering Alfas from Fixed Effects (Least Squares Dummy Variables) Suppose you are interested in to obtain a specific regression for firm 3. E.g., many international economists need to find a country-specific equation when they are dealing with country panels. If you are in this situation, don't worry. The fixed effects estimators are already taking into account all individual effects. The only mysterious thing happening is that such individual intercepts are not being shown in the regression output. In the example above, the intercept shown in the fixed effects output is not specific to any firm. Instead, it is an average of all firms intercepts. You can recover the intercept of your cross-sectional unit after using fixed effects estimators. For the example above, let's calculate the fixed effects model including dummy variables for each firm, instead of a common intercept (some authors call this Lest Squares Dummy Variables, but it is the same fixed effects you saw earlier). In STATA: regress lnc lny D1 D2 D3 D4 D5 D6, noconst
Source | SS
df MS
Number of obs = 24
------------------------------------------------------------------------------
The slope is obviously the same. The only change is the substitution of a common intercept for 6 dummies, each of them representing a cross-sectional unit. Now suppose you would like to know if the difference in the firms effects is statistically significant. How to do that? - Regress the fixed effects estimators above, including the intercept and the dummies: regress
lnc lny D1 D2 D3 D4 D5 D6
------------------------------------------------------------------------------
Note that one of the dummies is dropped (due to perfect collinearity of the constant), and all other dummies are represented as the difference between their original value and the constant . (The value of the constant in this second regression equals the value of the dropped dummy in the previous regression. The dropped dummy is seen as the benchmark.) - Obtain the R-squared from restricted (POLS) and unresctricted (fixed effects with dummies) models scalar
R2LSDV=_result(7)
- Perform the traditional
F-test, comparing the unrestricted regression with the restricted regression:
where the subscript "u" refers to the unrestricted regression (fixed effects with dummies), and the subscript "p" to the restricted regression (POLS). Under the null hypothesis, POLS are more efficient. scalar
F=((R2LSDV-R2OLS)/(6-1))/((1-R2LSDV)/(24-6-1))
The result above can be compared with the critical value of F(5,17), which equals 4.34 at 1% level. Therefore, we reject the null hypothesis of common intercept for all firms. References:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
Last update: November 6, 2007 |