|
||||||||||
Applied Econometrics
e-Tutorial 7: Autocorrelation, ARCH, and Heteroscedasticity |
||||||||||
|
||||||||||
Welcome
to the seventh tutorial of Econ 508. In the current issue I am going to
summarize some well known tests for autocorrelation and ARCH processes.
I draw on Johnston and DiNardo's (1997) Econometric Methods, and
Professor Koenker's Lecture 7. I also provide additional support on
testing for heteroscedasticity (see Appendix)
and a starting point for those who want to explore further aspects of ARCH
and GARCH process. This document is due to Roberto Perrelli (click here). Tests for Autocorrelated Errors Background: If you run a regression without lagged variables, and detect autocorrelation, your OLS estimators are unbiased, consistent, but inefficient and provide incorrect standard errors. In the case that you include lagged dependent variables among the covariates and still detect autocorrelation, then you are in bigger trouble: OLS estimators are inconsistent. To test for the presence
of autocorrelation, you have a large menu of options. Here I suggest the use
of the Breusch-Godfrey test, and I will show how to implement this test using
the dataset AUTO2.dta, which can be downloaded from here
in .dta (STATA users), from here in ascii (R users),
or from the Econ 508
web page (Data).
Test: Breusch-Godfrey Background: Suppose you are running a version of model (2), problem set 2, in which the original data is replaced by AUTO2. Then your model will be: gast = b0 + b1 incomet + b2 pricet + b3 (pricet)2 + b4 (pricet*incomet) + ut and you wish to test whether the disturbances are autocorrelated. The steps to do that are as follows: (i) Run an OLS in your original equation: sort
quarter gen
price2= price*price
Source | SS
df
MS
Number of obs = 128 ------------------------------------------------------------------------------
(ii) Obtain the
estimated residuals: (iii) Regress the estimated residuals (uhat) on the explanatory variables of the original model (income, price, price2, priceinc, constant) and lagged residuals (L.uhat). Call this the auxiliary regression. regress uhat income price price2 priceinc L.uhat
Source |
SS df
MS
Number of obs = 127 ------------------------------------------------------------------------------
(iv) From the auxilliary regression above, obtain the R-squared and multiply it by the number of included observations: scalar
N=_result(1) (v) Under the null hypothesis of no autocorrelation, the test statistic NR2 converges asymptotically to a Chi-squared with s degrees of freedom, where s is the number of lags of the residuals included in the auxiliary regression. In the case above, s=1, and we have: scalar
chi15=invchi(1, .05) In the example above, NR2
= 115.45 > 3.84 = Chi2 (1, 5%). Hence, we reject the null hypothesis of no
autocorrelation on the disturbances. Test for ARCH Errors For a brief introduction on ARCH processes, click here, or just visit the Econ 508 web page (e-TA). To test for ARCH errors, you can use an LM test as follows: (i) Run an OLS in your original equation: regress gas income price price2 priceinc
Source |
SS df
MS
Number of obs = 128 ------------------------------------------------------------------------------
(ii) Generate the residuals and the squared residuals. predict
uhat, res (iii) Regress squared residuals on the explanatory variables of the original model (income, price, price2, priceinc, constant) and lagged squared residuals. Call this an auxiliary regression. regress
uhat2 L.uhat2 L2.uhat2 L3.uhat2 L4.uhat2 income price price2 priceinc
Source |
SS df
MS
Number of obs = 124 ------------------------------------------------------------------------------
. scalar
N=_result(1) scalar
chi45=invchi(4, .05) Under the null hypothesis
of no ARCH errors, the test statistic NR2 converges asymptotically to a
Chi-squared with q degrees of freedom, where q is the number of lags of the
squared residuals included in the auxiliary regression. In the case
above, q=4, and NR2=89.06 > 9.48 = Chi-squared(4, 5%). Therefore, we
reject the null hypothesis of no ARCH, and admit that our regression presents
time-varying variance. Appendix: Tests for Heteroscedasticity Under heteroscedastic errors, it is well known that OLS estimators are unbiased and consistent, but inefficient and provide incorrect standard errors. Hence it is very important to detect this anomaly in your regression. I will illustrate how to
test for heteroscedasticity using Current Population Survey (CPS) data
consisting on 100 observations on wages, educational level, years of
experience, and unionization status of U.S. male workers. The data was
borrowed from J&DN's (1997) Econometric Methods, and slightly
adjusted for the purposes of this tutorial. The variables are defined as
follows:
The data is available in both STATA (.dta) form here, or R (ascii) form here. They can be also found in our Econ 508 web site (Data). After you download the data, the next step is to run a "traditional" wages equation involving the variables above described. In STATA, you can do that as follows: *Generate
the variable experience squared: *Run
the wages equation:
Source |
SS df
MS
Number of obs = 100 ------------------------------------------------------------------------------
Test 1: White Here the strategy is as
follows: (ii) Get the residuals:
(iii) Generate the
squared residuals: (iv) Generate new explanatory variables, in the form of the squares of the explanatory variables and the cross-product of the explanatory variables: gen
grade2=grade^2 Because union is a dummy variable, its squared values are equal to the original values, and we don't need to add the squared dummy in the model. Also the squared experience was already in the original model (in the form of exp2), so we don't need to add that in this auxiliary regression. (v) Regress the squared residuals into a constant, the original explanatory variables, and the set of auxiliary explanatory variables (squares and cross-products) you've just created: regress error2 grade exp exp2 union grade2 exp4 exp3 gradexp gradexp2 gradeuni expunion exp2uni
Source |
SS df
MS
Number of obs = 100 ------------------------------------------------------------------------------
(vi) Get the sample size (N) and the R-squared (R2), and construct the test statistic N*R2: scalar
N=_result(1) (vii) Under the null hypothesis, the errors are homoscedastic, and NR2 is asymptotically distributed as a Chi-squared with k-1 degrees of freedom (where k is the number of coefficients on the auxiliary regression). In this last case, k=13. Hence, a Chi-squared with 12 degress of freedom and 5%, we have: scalar
chi125=invchi(12, .05) And we observe that the
test statistic NR2 is near 10.79, while the Chi-squared(12, 5%) is about
21.03, much bigger than the test statistic. Hence, the null hypothesis
(homoscedasticity) can not be rejected. Test 2: Breusch-Pagan-Godfrey The Lagrange Multiplier test proposed by BPG can be executed as follows: (i) Run the OLS
regression (as you've done above, the output is ommited): (ii) Get the sum of the
squared residuals: (iii) Generate a disturbance correction factor in the form of sum of the squared residuals divided by the sample size: scalar
N=_result(1) (iv) Regress the adjusted squared errors (in the form of original squared errors divided by the correction fator) on a list of explanatory variables supposed to influence the heteroscedasticity. Following JDN, we will asume that, from the original dataset, only the main variables grade, exp, and union affect the heteroscedasticity. Hence: gen
adjerr2=(error^2)/sigmahat Source
| SS
df
MS
Number of obs = 100 ------------------------------------------------------------------------------
This auxiliary regression
gives you a model sum of squares (ESS) equals: (vi) Under the null hypoteshis of homoscedasticity, (1/2) ESS asymptotically converges to a Chi-squared(k-1, 5%), where k is the number of coefficients on the auxiliary regression. In the last case, k=4. Hence, comparing (1/2) ESS with a Chi-squared with 3 degress of freedom and 5%, we have: scalar
halfESS=(1/2)*ESS Hence, the calculated
statistic halfESS = 5.35, while the critical value of a Chi-squared
(3, 5%) = 7.81. Therefore, the test statistic falls short of the critical
value, and the null hypothesis of homoscedasticity can not be rejected.
Test 3: Goldfeld-Quandt Suppose now you believe a single explanatory variable is responsible for most of the heteroscedasticy in your model. For example, let's say that experience (exp) is the "trouble-maker" variable. Hence, you can proceed with the Goldfeld-Quandt test as follows: (i) Sort your data according to the variable exp. Then divide your data in, say, three parts, drop the observations of the central part, and run separate regressions for the bottom part (Regression 1) and the top part (Regression 2). After each regression, ask for the respective Residual Sum of Squares RSS: sort
exp
Source |
SS df
MS
Number of obs = 35 ------------------------------------------------------------------------------
scalar
RSS1=_result(4) regress lnwage grade exp exp2 union if index>65
Source |
SS df
MS
Number of obs = 35 ------------------------------------------------------------------------------
scalar
RSS2=_result(4) (ii) Then compute the ratio of the Residuals Sum of Squares, R= RSS2/RSS1. Under the null hypothesis of homoscedasticity, this ratio R is distributed according to a F((n-c-2k)/2, (n-c-2k)/2) degrees of freedom, where n is the sample size, c is the number of dropped observations, and k is the number of regressors in the model. In the example above, n=100, c=30, and k=5. Hence, R ~ F(30, 30). And under the null, R < F. Hence: scalar
R=RSS2/RSS1 scalar
F30305=invfprob(30,30,.05) Hence, R < F, and we
can not reject the null hypothesis of homocedasticity. USING R: library(dyn) d.gas<-read.table("http://www.econ.uiuc.edu/~econ472/AUTO2.txt",header=T) attach(d.gas) gas<-ts(gas,start=1959,frequency=4) price<-ts(price,start=1959,frequency=4) income<-ts(income,start=1959,frequency=4) miles<-ts(miles,start=1959,frequency=4) price2<-price^2 princ<-price*income Test: Breusch-Godfrey (i)
Run
an OLS in your original equation: model<-lm(gas~income+price+price2+princ)
(ii)
Obtain
the estimated residuals: uhat<-model$resid uhat<-
ts(uhat,start=1959,frequency=4) (iii) Regress the
estimated residuals (uhat) on the explanatory variables of the
original model (income, price, price2, priceinc, constant) and lagged
residuals (L.uhat). Call this the auxiliary regression. model.adj<-dyn$lm(uhat~lag(uhat,-1)+income+price+price2+princ) (iv) From the auxilliary regression above, obtain the R-squared and multiply it by the number of included observations: R2<-summary(model.adj)$r.squared R2 [1]
0.9091105 #Constructing
R2: SSR<-sum((model.adj$resid)^2) [1] 0.03442517 SSE<-sum((model.adj$fitted-mean(uhat))^2) #Note
that R gives to you just the fitted y, so you need to subtract the mean in
order to get the explained sum squared. SSE [1] 0.3443454 SST<-SSR+SSE R2<-
SSE/SST (v) Under the null hypothesis of no autocorrelation, the test statistic NR2 converges asymptotically to a Chi-squared with s degrees of freedom, where s is the number of lags of the residuals included in the auxiliary regression. In the case above, s=1, and we have: N<-127
#Sample size #
Or N<-(model$df)+length(model$coef)
N*R2 [1]
116.3665 n the
example above, NR2 = 116.3665 > 3.84 = Chi2 (1, 5%). Hence, we reject the
null hypothesis of no autocorrelation on the disturbances. Test for ARCH Errors For a brief introduction on ARCH processes. To test for ARCH errors, you can use an LM test as follows: (i) Run an OLS in your original equation: model2<-lm(gas~income+price+price2+princ) (ii) Generate the residuals and the squared residuals. uhat2<-(model$resid)^2 uhat2<-ts(uhat2,start=1959,frequency=4) (iii) Regress squared residuals on the explanatory variables of the original model (income, price, price2, priceinc, constant) and lagged squared residuals. Call this an auxiliary regression. f<-dyn$lm(uhat2~lag(uhat2,-1)+lag(uhat2,-2)+
lag(uhat2,-3)+ lag(uhat2,-4)+price+
income+price2+princ)
(iii) From the auxiliary regression, calculate NR2 and compare with a Chi-squared (q, 5%), where q is the number of included lags of the squared residuals: R2<-summary(f)$r.squared [1]
0.7182344 Under the null hypothesis
of no ARCH errors, the test statistic NR2 converges asymptotically to a
Chi-squared with q degrees of freedom, where q is the number of lags of the
squared residuals included in the auxiliary regression. In the case
above, q=4, and NR2=91.94 > 9.48 = Chi-squared(4, 5%). Therefore, we
reject the null hypothesis of no ARCH, and admit that our regression presents
time-varying variance. n*R2 [1]
91.934 Tests for Heteroscedasticity: d.d<-read.table("http://www.econ.uiuc.edu/~econ472/CPS.txt",header=TRUE) attach(d.d) exp2<-exp^2 Test 1: White Here the strategy is as
follows: g<-lm(lnwage~grade+exp+exp2+union) (ii) Get the residuals: g.resid<-g$resid (iii) Generate the
squared residuals: g.resid2<-g.resid^2 (iv) Generate new explanatory variables, in the form of the squares of the explanatory variables and the cross-product of the explanatory variables: grade2<-grade^2
Because union is a dummy variable, its squared values are equal to the original values, and we don't need to add the squared dummy in the model. Also the squared experience was already in the original model (in the form of exp2), so we don't need to add that in this auxiliary regression. (v) Regress the squared residuals into a constant, the original explanatory variables, and the set of auxiliary explanatory variables (squares and cross-products) you've just created: g.final<-lm(g.resid2~grade+exp+exp2+union+grade2+exp4+exp3+gradexp +gradexp2+gradeuni+expunion+exp2uni) (vi) Get the sample size (N) and the R-squared (R2), and construct the test statistic N*R2: N<-(g$df)+length(g$coef) R2<-summary(g.final)$r.squared N*R2 [1] 10.78813 (vii) Under the null hypothesis, the errors are homoscedastic, and NR2 is asymptotically distributed as a Chi-squared with k-1 degrees of freedom (where k is the number of coefficients on the auxiliary regression). In this last case, k=13. And we observe that the test statistic NR2 is near 10.79, while the Chi-squared(12, 5%) is about 21.03, much bigger than the test statistic. Hence, the null hypothesis (homoscedasticity) can not be rejected. Test 2: Breusch-Pagan-Godfrey The Lagrange Multiplier test proposed by BPG can be executed as follows: (i) Run the OLS
regression (as you've done above, the output is ommited): g<-lm(lnwage~grade+exp+exp2+union) (ii) Get the sum of the
squared residuals: g.resid<-g$resid g.ssr<-sum((g$resid)^2) g.ssr (iii) Generate a disturbance correction factor in the form of sum of the squared residuals divided by the sample size: dcf<-g.ssr/((g$df)+length(g$coef)) (iv) Regress the adjusted squared errors (in the form of original squared errors divided by the correction fator) on a list of explanatory variables supposed to influence the heteroscedasticity. Following JDN, we will asume that, from the original dataset, only the main variables grade, exp, and union affect the heteroscedasticity. Hence: adjerr2<-(g.resid^2)/dcf g.bptest<-lm(adjerr2~grade+exp+union) summary(g.bptest) This auxiliary regression
gives you a model sum of squares (ESS) equals: ess<-sum((g.bptest$fitted-mean(adjerr2))^2) [1] 10.70477 (vi) Under the null hypoteshis of homoscedasticity, (1/2) ESS asymptotically converges to a Chi-squared(k-1, 5%), where k is the number of coefficients on the auxiliary regression. In the last case, k=4. Hence, comparing (1/2) ESS with a Chi-squared with 3 degress of freedom and 5%, we have: Hence, the calculated statistic halfESS = 5.35, while the critical
value of a Chi-squared (3, 5%) = 7.81. Therefore, the test statistic falls
short of the critical value, and the null hypothesis of homoscedasticity can
not be rejected.
|
||||||||||
|
|
Last update: September 20, 2007 |