Welcome to the seventh tutorial of Econ 536. In the current issue I am going to summarize some well known tests for autocorrelation and ARCH processes. I draw on Johnston and DiNardo’s (1997) Econometric Methods, and Professor Koenker’s Lecture 7. We also provide additional support on testing for heteroscedasticity (see Appendix) and a starting point for those who want to explore further aspects of ARCH and GARCH process (see Perrelli, 2001). 1
To test for ARCH errors, you can use an LM test as follows:
model2<-lm(gas~income+price+price2+princ)
uhat2<-(model$resid)^2
uhat2<-ts(uhat2,start=1959,frequency=4)
f<-dyn$lm(uhat2~lag(uhat2,-1)+lag(uhat2,-2)+ lag(uhat2,-3)+lag(uhat2,-4)+price+ income+price2+princ)
R2<-summary(f)$r.squared
n<-(model$df)+length(model$coef)
n*R2
[1] 91.934
Under the null hypothesis of no ARCH errors, the test statistic NR2 converges asymptotically to a Chi-squared with q degrees of freedom, where q is the number of lags of the squared residuals included in the auxiliary regression. In the case above, q=4, and NR2=91.93 > 9.49 = Chi-squared(4, 5%). Therefore, we reject the null hypothesis of no ARCH, and admit that our regression presents time-varying variance.
Under heteroscedastic errors, it is well known that OLS estimators are unbiased and consistent, but inefficient and provide incorrect standard errors. Hence it is very important to detect this anomaly in your regression.
We will illustrate how to test for heteroscedasticity using Current Population Survey (CPS) data consisting on 100 observations on wages, educational level, years of experience, and unionization status of U.S. male workers. The data was borrowed from J&DN’s (1997) Econometric Methods, and slightly adjusted for the purposes of this tutorial. The variables are defined as follows:
Variable | Description |
---|---|
lnwage | log of hourly wage in dollars |
grade | Highest educational grade completed |
exp | Years of experience |
union | Dummy variable: 1 if union member, 0 otherwise |
You can download the data directly form the Econ 536 website
d.d<-read.table("http://www.econ.uiuc.edu/~econ536/Data/CPS.txt",header=T)
head(d.d)
lnwage grade exp union
1 2.331172 8 22 0
2 1.504077 14 2 0
3 3.911523 16 22 0
4 2.197225 8 34 1
5 2.788093 9 47 0
6 2.351375 9 32 0
lnwage<-d.d$lnwage
grade<-d.d$grade
exp<-d.d$exp
union<-d.d$union
exp2<-exp^2
After you download the data, the next step is to run a “traditional” wages equation involving the variables above described. In R, you can do that as follows:
summary(lm(lnwage~grade+exp+exp2+union))
Call:
lm(formula = lnwage ~ grade + exp + exp2 + union)
Residuals:
Min 1Q Median 3Q Max
-1.01553 -0.28642 -0.04438 0.29378 1.45359
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5951062 0.2834855 2.099 0.038447 *
grade 0.0835426 0.0200928 4.158 7.04e-05 ***
exp 0.0502742 0.0141370 3.556 0.000589 ***
exp2 -0.0005617 0.0002879 -1.951 0.053954 .
union 0.1659285 0.1244544 1.333 0.185639
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.47 on 95 degrees of freedom
Multiple R-squared: 0.3718, Adjusted R-squared: 0.3453
F-statistic: 14.06 on 4 and 95 DF, p-value: 4.794e-09
Here the strategy is as follows:
g<-lm(lnwage~grade+exp+exp2+union)
g.resid<-g$resid
g.resid2<-g.resid^2
grade2<-grade^2
exp4<-exp2^2
gradexp<-grade*exp
gradexp2<-grade*exp2
gradeuni<-grade*union
exp3<-exp*exp2
expunion<-exp*union
exp2uni<-exp2*union
Because union is a dummy variable, its squared values are equal to the original values, and we don’t need to add the squared dummy in the model. Also the squared experience was already in the original model (in the form of exp2), so we don’t need to add that in this auxiliary regression.
g.final<-lm(g.resid2~grade+exp+exp2+union+grade2+exp4+exp3+gradexp +gradexp2+gradeuni+expunion+exp2uni)
N<-(g$df)+length(g$coef)
R2<-summary(g.final)$r.squared
N*R2
[1] 10.78813
And we observe that the test statistic NR2 is about 10.7881345, while the Chi-squared(12, 5%) is about 9.49, much bigger than the test statistic. Hence, the null hypothesis (homoscedasticity) can not be rejected.
The Lagrange Multiplier test proposed by BPG can be executed as follows:
g<-lm(lnwage~grade+exp+exp2+union)
g.resid<-g$resid
g.ssr<-sum((g$resid)^2)
g.ssr
[1] 20.98938
dcf<-g.ssr/((g$df)+length(g$coef))
adjerr2<-(g.resid^2)/dcf
g.bptest<-lm(adjerr2~grade+exp+union)
summary(g.bptest)
Call:
lm(formula = adjerr2 ~ grade + exp + union)
Residuals:
Min 1Q Median 3Q Max
-1.5484 -0.8613 -0.4512 0.2889 8.7774
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.326100 0.949202 -0.344 0.732
grade 0.098944 0.064351 1.538 0.127
exp 0.009954 0.013198 0.754 0.453
union -0.582429 0.396332 -1.470 0.145
Residual standard error: 1.579 on 96 degrees of freedom
Multiple R-squared: 0.0428, Adjusted R-squared: 0.01288
F-statistic: 1.431 on 3 and 96 DF, p-value: 0.2386
This auxiliary regression gives you a model sum of squares (ESS):
ess<-sum((g.bptest$fitted-mean(adjerr2))^2)
Suppose now you believe a single explanatory variable is responsible for most of the heteroscedasticy in your model. For example, let’s say that experience (exp) is the “trouble-maker” variable. Hence, you can proceed with the Goldfeld-Quandt test as follows:
Sort your data according to the variable exp. Then divide your data in, say, three parts, drop the observations of the central part, and run separate regressions for the bottom part (Regression 1) and the top part (Regression 2). After each regression, ask for the respective Residual Sum of Squares RSS:
Then compute the ratio of the Residuals Sum of Squares, R= RSS2/RSS1. Under the null hypothesis of homoscedasticity, this ratio R is distributed according to a \(F_{\left(\frac{(n-c-2k)}{2},\frac{(n-c-2k)}{2}\right)}\), where n is the sample size, c is the number of dropped observations, and k is the number of regressors in the model.
This is left for the reader as an exercise. To check your results you should get: \(R < F\), and as a consequence can not reject the null hypothesis of homocedasticity
The three Heteroscedasticity tests here presented are clasicals ones and so they are very likely to be packages that already calculate this for you. One of such packages is lmtest
package. For example you could do the Breusch-Pagan-Godfrey test by typing
require(lmtest)
Loading required package: lmtest
bptest(lnwage~grade+exp+exp2+union, studentize=FALSE)
Breusch-Pagan test
data: lnwage ~ grade + exp + exp2 + union
BP = 6.1161, df = 4, p-value = 0.1906
Note that the results are somehow different, but it hs to do on how the bptest
function was written. However, the conclusion does not change.
Another thing to note is the option studentize=FALSE
note that the default for this funciton is set to TRUE
Prof. Koenkerâs studentized version of the test statistic is used.
You can also run a Goldfeld-Quandt test and check wether your results following the above steps coincide with the output of the gqtest
included in the package
gqtest(lnwage~grade+exp+exp2+union)
Goldfeld-Quandt test
data: lnwage ~ grade + exp + exp2 + union
GQ = 1.4923, df1 = 45, df2 = 45, p-value = 0.09161
Please send comments to hrtdmrt2@illinois.edu or srmntbr2@illinois.edu↩