# e-TA 6: Autocorrelation, ARCH, and Heteroscedasticity

Welcome to the seventh tutorial of Econ 508. In the current issue I
am going to summarize some well known tests for autocorrelation and
ARCH processes. I draw on Johnston and DiNardo’s (1997) Econometric
Methods, and Professor Koenker’s Lecture 7. We also provide additional
support on testing for heteroscedasticity (see Appendix) and a starting
point for those who want to explore further aspects of ARCH and GARCH
process (see Perrelli, 2001). ^{1}

# Test for ARCH Errors

To test for ARCH errors, you can use an LM test as follows:

- Run an OLS in your original equation:

` qui: regress gas income price price2 priceinc`

- Generate the residuals and the squared residuals.

` predict vhat, resid`

gen vhat2 = vhat^2

- Regress squared residuals on the explanatory variables of the original model (income, price, price2, priceinc, constant) and lagged squared residuals. Call this an auxiliary regression.

` qui: regress vhat2 L.vhat2 L2.vhat2 L3.vhat2 L4.vhat2 income price price2 priceinc`

- From the auxiliary regression, calculate NR2 and compare with a Chi-squared (q, 5%), where q is the number of included lags of the squared residuals:

` scalar nR2 = e(N)*e(r2)`

display "n*R2= " nR2 " and the Chi2 critical value is: " invchi2(4,.95)

`n*R2= 89.061572 and the Chi2 critical value is: 9.487729`

Under the null hypothesis of no ARCH errors, the test statistic NR2 converges asymptotically to a Chi-squared with q degrees of freedom, where q is the number of lags of the squared residuals included in the auxiliary regression. In the case above, q=4, and NR2=89.06 > 9.49 = Chi-squared(4, 5%). Therefore, we reject the null hypothesis of no ARCH, and admit that our regression presents time-varying variance.

# Appendix: Tests for Heteroscedasticity

Under heteroscedastic errors, it is well known that OLS estimators are unbiased and consistent, but inefficient and provide incorrect standard errors. Hence it is very important to detect this anomaly in your regression.

We will illustrate how to test for heteroscedasticity using Current Population Survey (CPS) data consisting on 100 observations on wages, educational level, years of experience, and unionization status of U.S. male workers. The data was borrowed from J&DN’s (1997) Econometric Methods, and slightly adjusted for the purposes of this tutorial. The variables are defined as follows:

Variable | Description |
---|---|

lnwage | log of hourly wage in dollars |

grade | Highest educational grade completed |

exp | Years of experience |

union | Dummy variable: 1 if union member, 0 otherwise |

You can download the data directly form the Econ 508 website

` webuse "CPS.dta", clear`

list in 1/6

` +--------------------------------+`

| lnwage grade exp union |

|--------------------------------|

1. | 2.331172 8 22 0 |

2. | 1.504077 14 2 0 |

3. | 3.911523 16 22 0 |

4. | 2.197225 8 34 1 |

5. | 2.788093 9 47 0 |

|--------------------------------|

6. | 2.351375 9 32 0 |

+--------------------------------+

` gen exp2=exp^2`

After you download the data, the next step is to run a “traditional” wages equation involving the variables above described. In R, you can do that as follows:

` regress lnwage grade exp exp2 union`

```
Source | SS df MS Number of obs = 100
```

-------------+------------------------------ F( 4, 95) = 0.56

Model | 2.3434e+69 4 5.8585e+68 Prob > F = 0.6957

Residual | 1.0024e+71 95 1.0552e+69 R-squared = 0.0228

-------------+------------------------------ Adj R-squared = -0.0183

Total | 1.0259e+71 99 1.0362e+69 Root MSE = 3.2e+34

------------------------------------------------------------------------------

lnwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

grade | 8.91e+31 9.56e+31 0.93 0.354 -1.01e+32 2.79e+32

exp | 9.55e+31 7.49e+31 1.28 0.205 -5.31e+31 2.44e+32

exp2 | 7.88e+29 9.27e+29 0.85 0.398 -1.05e+30 2.63e+30

union | -3.55e+31 1.07e+32 -0.33 0.741 -2.48e+32 1.77e+32

_cons | -2.68e+33 5.92e+33 -0.45 0.652 -1.44e+34 9.08e+33

------------------------------------------------------------------------------

## Test 1: White

Here the strategy is as follows:

- Run the OLS regression (as you’ve done above, the results are omitted):

` qui: regress lnwage grade exp exp2 union`

- Get the residuals:

` predict resid, resid`

- Generate the squared residuals:

` gen resid2 = resid^2`

- Generate new explanatory variables, in the form of the squares of the explanatory variables and the cross-product of the explanatory variables:

`gen grade2 = grade^2`

gen exp4 = exp2^2

`gen gradexp = grade*exp`

`gen gradexp2 = grade*exp2`

`gen gradeuni = grade*union`

`gen exp3 = exp*exp2`

`gen expunion = exp*union`

`gen exp2uni = exp2*union`

Because union is a dummy variable, its squared values are equal to the original values, and we don’t need to add the squared dummy in the model. Also the squared experience was already in the original model (in the form of exp2), so we don’t need to add that in this auxiliary regression.

- Regress the squared residuals into a constant, the original explanatory variables, and the set of auxiliary explanatory variables (squares and cross-products) you’ve just created:

` qui: regress resid2 grade exp exp2 union grade2 exp4 exp3 gradexp gradexp2 gradeuni `

expunion exp2uni

- Get the sample size (N) and the R-squared (R2), and construct the test statistic N*R2:

` scalar nR2 = e(N)*e(r2)`

display nR2

`10.788134`

- Under the null hypothesis, the errors are homoscedastic, and NR2 is asymptotically distributed as a Chi-squared with k-1 degrees of freedom (where k is the number of coefficients on the auxiliary regression). In this last case, k=13.

And we observe that the test statistic NR2 is about 10.79, while the Chi-squared(12, 5%) is about 21.03, much bigger than the test statistic. Hence, the null hypothesis (homoscedasticity) can not be rejected.

## Test 2: Breusch-Pagan-Godfrey

The Lagrange Multiplier test proposed by BPG can be executed as follows:

- Run the OLS regression (as you’ve done above, the output is omitted):

` qui: regress lnwage grade exp exp2 union`

- Get the sum of the squared residuals:

` predict error, resid `

matrix accum E=error

matrix list E

* Or obtain directly from regression output

dis e(rss)

`symmetric E[2,2]`

error _cons

error 20.989384

_cons 4.470e-08 100

- Generate a disturbance correction factor in the form of sum of the squared residuals divided by the sample size:

`scalar sigmahat=`

`e(rss)/e(N)`

`dis sigmahat`

`.20989384`

- Regress the adjusted squared errors (in the form of original squared errors divided by the correction factor) on a list of explanatory variables supposed to influence the heteroscedasticity. Following JDN, we will assume that, from the original dataset, only the main variables grade, exp, and union affect the heteroscedasticity. Hence:

` gen adjerr2=(error^2)/sigmahat `

regress adjerr2 grade exp union

Source | SS df MS Number of obs = 100

-------------+------------------------------ F( 3, 96) = 1.43

Model | 10.7047726 3 3.56825754 Prob > F = 0.2386

Residual | 239.425216 96 2.49401266 R-squared = 0.0428

-------------+------------------------------ Adj R-squared = 0.0129

Total | 250.129988 99 2.52656554 Root MSE = 1.5792

------------------------------------------------------------------------------

adjerr2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

grade | .0989441 .0643511 1.54 0.127 -.0287919 .2266801

exp | .0099537 .0131975 0.75 0.453 -.0162432 .0361506

union | -.5824294 .3963325 -1.47 0.145 -1.369143 .2042844

_cons | -.3260997 .9492019 -0.34 0.732 -2.210251 1.558051

------------------------------------------------------------------------------

This auxiliary regression gives you a model sum of squares (ESS):

` scalar ESS=e(mss) `

- Under the null hypothesis of homoscedasticity, (1/2) ESS asymptotically converges to a Chi-squared(k-1, 5%), where k is the number of coefficients on the auxiliary regression. In the last case, k=4. Hence, we need to compare (1/2) ESS with a Chi-squared with 3 degrees of freedom and 5%. Doing so we get (1/2) ESS = 5.35, while the critical value of a Chi-squared (3, 5%) = 7.81. Therefore, the test statistic falls short of the critical value, and the null hypothesis of homoscedasticity can not be rejected.

## Test 3: Goldfeld-Quandt

Suppose now you believe a single explanatory variable is responsible for most of the heteroscedasticy in your model. For example, let’s say that experience (exp) is the “trouble-maker” variable. Hence, you can proceed with the Goldfeld-Quandt test as follows:

Sort your data according to the variable exp. Then divide your data in, say, three parts, drop the observations of the central part, and run separate regressions for the bottom part (Regression 1) and the top part (Regression 2). After each regression, ask for the respective Residual Sum of Squares RSS:

Then compute the ratio of the Residuals Sum of Squares, R= RSS2/RSS1. Under the null hypothesis of homoscedasticity, this ratio R is distributed according to a \(F_{\left(\frac{(n-c-2k)}{2},\frac{(n-c-2k)}{2}\right)}\), where n is the sample size, c is the number of dropped observations, and k is the number of regressors in the model.

This is left for the reader as an exercise. To check your results you should get: \(R < F\), and as a consequence can not reject the null hypothesis of homocedasticity

Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu↩