Econ 508

Econometrics Group

Home | Faculty | Students | Alumni | Courses | Research | Reproducibility | Lab | Seminars | Economics | Statistics | Fame

Applied Econometrics
Econ 508 - Fall 2007

e-Tutorial 7: Autocorrelation, ARCH, and Heteroscedasticity

Welcome to the seventh tutorial of Econ 508. In the current issue I am going to summarize some well known tests for autocorrelation and ARCH processes. I  draw on Johnston and DiNardo's (1997) Econometric Methods, and Professor Koenker's Lecture 7.  I also provide additional support on testing for heteroscedasticity (see Appendix) and a starting point for those who want to explore further aspects of ARCH and GARCH process. This document is due to Roberto Perrelli (click here). 
 

Tests for Autocorrelated Errors

Background:

If you run a regression without lagged variables, and detect autocorrelation, your OLS estimators are unbiased, consistent, but inefficient and provide incorrect standard errors. In the case that you include lagged dependent variables among the covariates and still detect autocorrelation, then you are in bigger trouble: OLS estimators are inconsistent. 

To test for the presence of autocorrelation, you have a large menu of options. Here I suggest the use of the Breusch-Godfrey test, and I will show how to implement this test using the dataset AUTO2.dta, which can be downloaded from here in .dta (STATA users), from here in ascii (R users), or from the Econ 508 web page (Data).
 

Test: Breusch-Godfrey

Background: Suppose you are running a version of model (2), problem set 2, in which the original data is replaced by AUTO2. Then your model will be:

gast = b0 + b1 incomet + b2 pricet + b3 (pricet)2 + b4 (pricet*incomet) + ut

and you wish to test whether the disturbances are autocorrelated. The steps to do that are as follows:

(i) Run an OLS in your original equation: 

sort quarter
gen t=_n 
label variable t "Integer time period" 
tsset t 
        time variable:  t, 1 to 128

gen price2= price*price 
gen priceinc= price*income 
regress gas income price price2 priceinc 

  Source |       SS       df       MS                  Number of obs =     128
---------+------------------------------               F(  4,   123) =  117.59
   Model |  1.45421455     4  .363553638               Prob > F      =  0.0000
Residual |   .38027632   123  .003091677               R-squared     =  0.7927
---------+------------------------------               Adj R-squared =  0.7860
   Total |  1.83449087   127   .01444481               Root MSE      =   .0556

------------------------------------------------------------------------------
     gas |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  income |  -6.109449   2.750048     -2.222   0.028        -11.553   -.6658979
   price |   3.090076   2.894747      1.067   0.288      -2.639898     8.82005
  price2 |   .2839261   .1693137      1.677   0.096        -.05122    .6190723
priceinc |   1.454257   .5879186      2.474   0.015        .290508    2.618006
   _cons |  -25.52231    11.8968     -2.145   0.034       -49.0713   -1.973313
------------------------------------------------------------------------------
 

(ii)  Obtain the estimated residuals: 
predict uhat, res 

(iii)  Regress the estimated residuals (uhat) on the explanatory variables of the original model (income, price, price2, priceinc, constant) and lagged residuals (L.uhat). Call this the auxiliary regression. 

regress  uhat income price price2 priceinc L.uhat

  Source |       SS       df       MS                  Number of obs =     127
---------+------------------------------               F(  5,   121) =  242.06
   Model |  .344333245     5  .068866649               Prob > F      =  0.0000
Residual |   .03442503   121  .000284504               R-squared     =  0.9091
---------+------------------------------               Adj R-squared =  0.9054
   Total |  .378758275   126  .003006018               Root MSE      =  .01687

------------------------------------------------------------------------------
uhat     |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
income   |   .4172861   .8343297      0.500   0.618       -1.23449    2.069062
price    |  -.5490077   .8782736     -0.625   0.533      -2.287782    1.189766
price2   |   .0171764   .0513647      0.334   0.739      -.0845136    .1188663
priceinc |   -.091186   .1783701     -0.511   0.610      -.4443168    .2619447
uhat     |
      L1 |   .9662016   .0277751     34.787   0.000       .9112136     1.02119
_cons    |   2.153221   3.609468      0.597   0.552      -4.992673    9.299114
------------------------------------------------------------------------------
 

(iv) From the auxilliary regression above, obtain the R-squared and multiply it by the number of included observations: 

scalar N=_result(1)
scalar R2=_result(7)
scalar NR2= N*R2
scalar list N  R2  NR2
         N =        127
        R2 =  .90911082
       NR2 =  115.45707

(v)  Under the null hypothesis of no autocorrelation, the test statistic NR2 converges asymptotically to a Chi-squared with s degrees of freedom, where s is the number of lags of the residuals included in the auxiliary regression. In the case above, s=1, and we have:

scalar chi15=invchi(1, .05)
scalar list chi15
     chi15 =  3.8414598

In the example above, NR2 = 115.45 > 3.84 = Chi2 (1, 5%). Hence, we reject the null hypothesis of no autocorrelation on the disturbances.
 
 

Test for ARCH Errors

For a brief introduction on ARCH processes, click here, or just visit the Econ 508 web page (e-TA). To test for ARCH errors, you can use an LM test as follows:

(i) Run an OLS in your original equation:  

regress   gas   income   price   price2   priceinc 

  Source |       SS       df       MS                  Number of obs =     128
---------+------------------------------               F(  4,   123) =  117.59
   Model |  1.45421455     4  .363553638               Prob > F      =  0.0000
Residual |   .38027632   123  .003091677               R-squared     =  0.7927
---------+------------------------------               Adj R-squared =  0.7860
   Total |  1.83449087   127   .01444481               Root MSE      =   .0556

------------------------------------------------------------------------------
     gas |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  income |  -6.109449   2.750048     -2.222   0.028        -11.553   -.6658979
   price |   3.090076   2.894747      1.067   0.288      -2.639898     8.82005
  price2 |   .2839261   .1693137      1.677   0.096        -.05122    .6190723
priceinc |   1.454257   .5879186      2.474   0.015        .290508    2.618006
   _cons |  -25.52231    11.8968     -2.145   0.034       -49.0713   -1.973313
------------------------------------------------------------------------------
 

(ii) Generate the residuals and the squared residuals.

predict uhat, res 
gen uhat2=uhat^2
 

(iii) Regress squared residuals on the explanatory variables of the original model (income, price, price2, priceinc, constant) and lagged squared residuals. Call this an auxiliary regression. 

regress uhat2 L.uhat2 L2.uhat2 L3.uhat2 L4.uhat2 income price price2 priceinc
 

  Source |       SS       df       MS                  Number of obs =     124
---------+------------------------------               F(  8,   115) =   36.64
   Model |  .000798428     8  .000099803               Prob > F      =  0.0000
Residual |  .000313219   115  2.7236e-06               R-squared     =  0.7182
---------+------------------------------               Adj R-squared =  0.6986
   Total |  .001111647   123  9.0378e-06               Root MSE      =  .00165

------------------------------------------------------------------------------
uhat2    |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
uhat2    |
      L1 |   .7754884   .0922908      8.403   0.000        .592678    .9582988
      L2 |  -.0635221   .1168995     -0.543   0.588      -.2950774    .1680332
      L3 |  -.0939969   .1203595     -0.781   0.436      -.3324058     .144412
      L4 |    .169538   .0963056      1.760   0.081      -.0212248    .3603008
income   |   .0191215   .0873065      0.219   0.827      -.1538158    .1920588
price    |   .1035899   .0891867      1.161   0.248      -.0730718    .2802516
price2   |  -.0124098   .0055696     -2.228   0.028       -.023442   -.0013776
priceinc |  -.0035345    .018658     -0.189   0.850      -.0404924    .0334234
_cons    |  -.2011828    .368725     -0.546   0.586      -.9315559    .5291904
------------------------------------------------------------------------------


(iii) From the auxiliary regression, calculate NR2 and compare with a Chi-squared (q, 5%), where q is the number of included lags of the squared residuals:

scalar N=_result(1)
scalar R2=_result(7)
scalar NR2= N*R2
scalar list N  R2  NR2
         N =        124
        R2 =  .71823848
       NR2 =  89.061572

scalar chi45=invchi(4, .05)
scalar list chi45
     chi45 =  9.4877332
 

Under the null hypothesis of no ARCH errors, the test statistic NR2 converges asymptotically to a Chi-squared with q degrees of freedom, where q is the number of lags of the squared residuals included in the auxiliary regression. In the case above, q=4, and NR2=89.06 > 9.48 = Chi-squared(4, 5%). Therefore, we reject the null hypothesis of no ARCH, and admit that our regression presents time-varying variance. 
 
 

Appendix: Tests for Heteroscedasticity

Under heteroscedastic errors, it is well known that OLS estimators are unbiased and consistent, but inefficient and provide incorrect standard errors. Hence it is very important to detect this anomaly in your regression. 

I will illustrate how to test for heteroscedasticity using Current Population Survey (CPS) data consisting on 100 observations on  wages, educational level, years of experience, and unionization status of U.S. male  workers. The data was borrowed from J&DN's (1997) Econometric Methods, and slightly adjusted for the purposes of this tutorial. The variables are defined as follows:
 

Variable

Description

lnwage 

Log of hourly wage in dollars

grade 

Highest educational grade completed

exp 

Years of experience

union 

Dummy variable: 1 if union member, 0 otherwise

The data is available in both STATA (.dta) form here, or R (ascii) form here. They can be also found in our Econ 508 web site (Data).

After you download the data, the next step is to run a "traditional" wages equation involving the variables above described. In STATA, you can do that as follows:

*Generate the variable experience squared:
gen exp2=exp^2

*Run the wages equation:
regress  lnwage grade exp exp2 union

  Source |       SS       df       MS                  Number of obs =     100
---------+------------------------------               F(  4,    95) =   14.06
   Model |  12.4223593     4  3.10558981               Prob > F      =  0.0000
Residual |  20.9893844    95  .220940888               R-squared     =  0.3718
---------+------------------------------               Adj R-squared =  0.3453
   Total |  33.4117436    99   .33749236               Root MSE      =  .47004

------------------------------------------------------------------------------
  lnwage |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   grade |   .0835426   .0200928      4.158   0.000       .0436534    .1234318
     exp |   .0502742    .014137      3.556   0.001       .0222087    .0783397
    exp2 |  -.0005617   .0002879     -1.951   0.054      -.0011332    9.74e-06
   union |   .1659286   .1244544      1.333   0.186      -.0811446    .4130017
   _cons |   .5951063   .2834855      2.099   0.038       .0323165    1.157896
------------------------------------------------------------------------------
 

Test 1: White

Here the strategy is as follows:
(i) Run the OLS regression (as you've done above, the results are ommited):
regress  lnwage grade exp exp2 union

(ii) Get the residuals:
predict error, resid

(iii) Generate the squared residuals:
gen error2=error^2

(iv) Generate new explanatory variables, in the form of the squares of the explanatory variables and the cross-product of the explanatory variables:

gen grade2=grade^2
gen exp4=exp2^2
gen gradexp=grade*exp
gen gradexp2=grade*exp2
gen gradeuni=grade*union
gen exp3=exp*exp2
gen expunion=exp*union
gen exp2uni=exp2*union

Because union is a dummy variable, its squared values are equal to the original values, and we don't need to add the squared dummy in the model. Also the squared experience was already in the original model (in the form of exp2), so we don't need to add that in this auxiliary regression.

(v) Regress the squared residuals into a constant, the original explanatory variables, and the set of auxiliary explanatory variables (squares and cross-products) you've just created:

regress   error2   grade  exp  exp2  union  grade2  exp4  exp3  gradexp  gradexp2   gradeuni  expunion  exp2uni

  Source |       SS       df       MS                  Number of obs =     100
---------+------------------------------              F( 12,    87) =    0.88
   Model |  1.18880733    12  .099067278               Prob > F      =  0.5731
Residual |   9.8307755    87   .11299742               R-squared     =  0.1079
---------+------------------------------               Adj R-squared = -0.0152
   Total |  11.0195828    99  .111308918               Root MSE      =  .33615

------------------------------------------------------------------------------
  error2 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   grade |  -.0122003   .1250207     -0.098   0.922      -.2606924    .2362919
     exp |    .077838   .0718804      1.083   0.282      -.0650321    .2207081
    exp2 |  -.0039901   .0040948     -0.974   0.333       -.012129    .0041488
   union |   .6487872   .8615962      0.753   0.453      -1.063729    2.361303
  grade2 |   .0021956   .0042473      0.517   0.607      -.0062464    .0106377
    exp4 |  -3.34e-07   1.51e-06     -0.221   0.826      -3.34e-06    2.67e-06
    exp3 |   .0000617   .0001419      0.435   0.665      -.0002204    .0003438
 gradexp |  -.0037523   .0049422     -0.759   0.450      -.0135755    .0060709
gradexp2 |   .0001168    .000111      1.052   0.296      -.0001038    .0003375
gradeuni |  -.0513743   .0443036     -1.160   0.249      -.1394324    .0366839
expunion |   .0019327   .0606141      0.032   0.975      -.1185444    .1224097
 exp2uni |  -.0002219   .0012589     -0.176   0.861      -.0027241    .0022804
   _cons |   -.077672   .9858038     -0.079   0.937      -2.037064     1.88172
------------------------------------------------------------------------------
 

(vi) Get the sample size (N) and the R-squared (R2), and construct the test statistic N*R2:

scalar N=_result(1)
scalar R2=_result(7)
scalar NR2=N*R2
scalar list N R2 NR2
         N =        100
        R2 =  .10788134
       NR2 =  10.788134

(vii) Under the null hypothesis, the errors are homoscedastic, and NR2 is asymptotically distributed as a Chi-squared with k-1 degrees of freedom (where k is the number of coefficients on the auxiliary regression). In this last case, k=13. Hence, a Chi-squared with 12 degress of freedom and 5%, we have:

scalar chi125=invchi(12, .05)
scalar list chi125
    chi125 =  21.026075

And we observe that the test statistic NR2 is near 10.79, while the Chi-squared(12, 5%) is about 21.03, much bigger than the test statistic. Hence, the null hypothesis (homoscedasticity) can not be rejected.
 
 

Test 2: Breusch-Pagan-Godfrey

The Lagrange Multiplier test proposed by BPG can be executed as follows:

(i) Run the OLS regression (as you've done above, the output is ommited):
regress  lnwage grade exp exp2 union

(ii) Get the sum of the squared residuals:
predict error, resid
matrix accum E=error
matrix list E
symmetric E[2,2]
           error      _cons
error  20.989384
_cons  4.470e-08        100
 

(iii) Generate a disturbance correction factor in the form of sum of the squared residuals divided by the sample size:

scalar N=_result(1)
scalar sigmahat=el(E,1,1)/N
scalar list N sigmahat
         N =        100
  sigmahat =  .20989384
 

(iv) Regress the adjusted squared errors (in the form of original squared errors divided by the correction fator) on a list of explanatory variables supposed to influence the heteroscedasticity. Following JDN, we will asume that, from the original dataset, only the main variables grade, exp, and union affect the heteroscedasticity. Hence:

gen adjerr2=(error^2)/sigmahat
regress  adjerr2  grade  exp  union

  Source |       SS       df       MS                  Number of obs =     100
---------+------------------------------               F(  3,    96) =    1.43
   Model |  10.7047727     3  3.56825756               Prob > F      =  0.2386
Residual |  239.425216    96  2.49401266               R-squared     =  0.0428
---------+------------------------------               Adj R-squared =  0.0129
   Total |  250.129988    99  2.52656554               Root MSE      =  1.5792

------------------------------------------------------------------------------
 adjerr2 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   grade |   .0989441   .0643511      1.538   0.127      -.0287919    .2266801
     exp |   .0099537   .0131975      0.754   0.453      -.0162432    .0361506
   union |  -.5824294   .3963325     -1.470   0.145      -1.369143    .2042844
   _cons |  -.3260997   .9492019     -0.344   0.732      -2.210251    1.558051
------------------------------------------------------------------------------

This auxiliary regression gives you a model sum of squares (ESS) equals:
scalar ESS=_result(2)
scalar list ESS
       ESS =  10.704773

(vi) Under the null hypoteshis of homoscedasticity, (1/2) ESS asymptotically converges to a Chi-squared(k-1, 5%), where k is the number of coefficients on the auxiliary regression. In the last case, k=4. Hence, comparing (1/2) ESS with a Chi-squared with 3 degress of freedom and 5%, we have:

scalar halfESS=(1/2)*ESS
scalar chi35=invchi(3, .05)
scalar list halfESS chi35
   halfESS =  5.3523863
     chi35 =  7.8147277

Hence, the calculated statistic halfESS = 5.35, while the critical value of a Chi-squared (3, 5%) = 7.81. Therefore, the test statistic falls short of the critical value, and the null hypothesis of homoscedasticity can not be rejected. 
 
 

Test 3: Goldfeld-Quandt

Suppose now you believe a single explanatory variable is responsible for most of the heteroscedasticy in your model. For example, let's say that experience (exp) is the "trouble-maker" variable. Hence, you can proceed with the Goldfeld-Quandt test as follows:

(i) Sort your data according to the variable exp. Then  divide your data in, say, three parts, drop the observations of the central part, and run separate regressions for the bottom part (Regression 1) and the top part (Regression 2). After each regression, ask for the respective Residual Sum of Squares RSS:

sort exp
gen index=_n
regress lnwage grade exp exp2 union if  index<36

  Source |       SS       df       MS                  Number of obs =      35
---------+------------------------------               F(  4,    30) =    4.92
   Model |  4.75921906     4  1.18980476               Prob > F      =  0.0036
Residual |  7.25165511    30  .241721837               R-squared     =  0.3962
---------+------------------------------               Adj R-squared =  0.3157
   Total |  12.0108742    34  .353261005               Root MSE      =  .49165

------------------------------------------------------------------------------
  lnwage |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   grade |   .0723887   .0424869      1.704   0.099       -.014381    .1591585
     exp |   .2109046    .109217      1.931   0.063      -.0121462    .4339554
    exp2 |   -.009115   .0076041     -1.199   0.240      -.0246446    .0064147
   union |    .412672    .368553      1.120   0.272      -.3400136    1.165358
   _cons |   .1705752   .6767128      0.252   0.803      -1.211457    1.552607
------------------------------------------------------------------------------

scalar RSS1=_result(4)
scalar list RSS1
      RSS1 =  7.2516551
 

regress  lnwage grade exp exp2 union if  index>65

  Source |       SS       df       MS                  Number of obs =      35
---------+------------------------------               F(  4,    30) =    4.72
   Model |  4.58734144     4  1.14683536               Prob > F      =  0.0045
Residual |  7.28272478    30  .242757493               R-squared     =  0.3865
---------+------------------------------               Adj R-squared =  0.3047
   Total |  11.8700662    34  .349119595               Root MSE      =   .4927

------------------------------------------------------------------------------
  lnwage |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   grade |   .1364797    .034399      3.968   0.000       .0662276    .2067318
     exp |   -.012187   .0781619     -0.156   0.877      -.1718148    .1474409
    exp2 |   .0003906   .0011308      0.345   0.732      -.0019188       .0027
   union |  -.0737199   .2162468     -0.341   0.736      -.5153548     .367915
   _cons |   .9964461   1.235018      0.807   0.426      -1.525797    3.518689
------------------------------------------------------------------------------

scalar RSS2=_result(4)
scalar list RSS2
      RSS2 =  7.2827248
 
 

(ii) Then compute the ratio of the Residuals Sum of Squares, R= RSS2/RSS1. Under the null hypothesis of homoscedasticity, this ratio R is distributed according to a F((n-c-2k)/2, (n-c-2k)/2) degrees of freedom, where n is the sample size, c is the number of dropped observations, and k is the number of regressors in the model. In the example above, n=100, c=30, and k=5. Hence, R ~ F(30, 30). And under the null, R < F. Hence:

scalar R=RSS2/RSS1
scalar list R
         R =  1.0042845

scalar F30305=invfprob(30,30,.05)
scalar list F30305
    F30305 =  1.8408746

Hence, R < F, and we can not reject the null hypothesis of homocedasticity.
 

 

USING R:

library(dyn)

d.gas<-read.table("http://www.econ.uiuc.edu/~econ472/AUTO2.txt",header=T)

attach(d.gas)

gas<-ts(gas,start=1959,frequency=4)

price<-ts(price,start=1959,frequency=4)

income<-ts(income,start=1959,frequency=4)

miles<-ts(miles,start=1959,frequency=4)

price2<-price^2

princ<-price*income

 

Test: Breusch-Godfrey

(i)                 Run an OLS in your original equation:

model<-lm(gas~income+price+price2+princ) 

(ii)        Obtain the estimated residuals: 

uhat<-model$resid

uhat<- ts(uhat,start=1959,frequency=4)

 

(iii)  Regress the estimated residuals (uhat) on the explanatory variables of the original model (income, price, price2, priceinc, constant) and lagged residuals (L.uhat). Call this the auxiliary regression. 

model.adj<-dyn$lm(uhat~lag(uhat,-1)+income+price+price2+princ) 

(iv) From the auxilliary regression above, obtain the R-squared and multiply it by the number of included observations: 

R2<-summary(model.adj)$r.squared

R2

[1] 0.9091105

#Constructing R2:

SSR<-sum((model.adj$resid)^2)

 [1] 0.03442517

SSE<-sum((model.adj$fitted-mean(uhat))^2)

#Note that R gives to you just the fitted y, so you need to subtract the mean in order to get the explained sum squared.

SSE

[1] 0.3443454

SST<-SSR+SSE

R2<- SSE/SST

 

(v)  Under the null hypothesis of no autocorrelation, the test statistic NR2 converges asymptotically to a Chi-squared with s degrees of freedom, where s is the number of lags of the residuals included in the auxiliary regression. In the case above, s=1, and we have:

N<-127 #Sample size

# Or N<-(model$df)+length(model$coef) 

N*R2

[1] 116.3665

 

n the example above, NR2 = 116.3665 > 3.84 = Chi2 (1, 5%). Hence, we reject the null hypothesis of no autocorrelation on the disturbances.
 

 

Test for ARCH Errors

For a brief introduction on ARCH processes. To test for ARCH errors, you can use an LM test as follows:

(i) Run an OLS in your original equation:  

model2<-lm(gas~income+price+price2+princ) 

(ii) Generate the residuals and the squared residuals.

uhat2<-(model$resid)^2

uhat2<-ts(uhat2,start=1959,frequency=4)

 

(iii) Regress squared residuals on the explanatory variables of the original model (income, price, price2, priceinc, constant) and lagged squared residuals. Call this an auxiliary regression. 

f<-dyn$lm(uhat2~lag(uhat2,-1)+lag(uhat2,-2)+ lag(uhat2,-3)+

lag(uhat2,-4)+price+ income+price2+princ)

 

(iii) From the auxiliary regression, calculate NR2 and compare with a Chi-squared (q, 5%), where q is the number of included lags of the squared residuals:

R2<-summary(f)$r.squared

[1] 0.7182344

 

Under the null hypothesis of no ARCH errors, the test statistic NR2 converges asymptotically to a Chi-squared with q degrees of freedom, where q is the number of lags of the squared residuals included in the auxiliary regression. In the case above, q=4, and NR2=91.94 > 9.48 = Chi-squared(4, 5%). Therefore, we reject the null hypothesis of no ARCH, and admit that our regression presents time-varying variance. 
 
n<-128 

n*R2

[1] 91.934

 

Tests for Heteroscedasticity:

d.d<-read.table("http://www.econ.uiuc.edu/~econ472/CPS.txt",header=TRUE)

attach(d.d)

exp2<-exp^2

 

Test 1: White

Here the strategy is as follows:
(i) Run the OLS regression (as you've done above, the results are ommited):

g<-lm(lnwage~grade+exp+exp2+union)

 (ii) Get the residuals:

g.resid<-g$resid

(iii) Generate the squared residuals:

g.resid2<-g.resid^2

(iv) Generate new explanatory variables, in the form of the squares of the explanatory variables and the cross-product of the explanatory variables:

grade2<-grade^2
exp4<-exp2^2
gradexp<-grade*exp
gradexp2<-grade*exp2
gradeuni<-grade*union
exp3<-exp*exp2
expunion<-exp*union
exp2uni<-exp2*union

Because union is a dummy variable, its squared values are equal to the original values, and we don't need to add the squared dummy in the model. Also the squared experience was already in the original model (in the form of exp2), so we don't need to add that in this auxiliary regression.

(v) Regress the squared residuals into a constant, the original explanatory variables, and the set of auxiliary explanatory variables (squares and cross-products) you've just created:

g.final<-lm(g.resid2~grade+exp+exp2+union+grade2+exp4+exp3+gradexp +gradexp2+gradeuni+expunion+exp2uni)

(vi) Get the sample size (N) and the R-squared (R2), and construct the test statistic N*R2:

N<-(g$df)+length(g$coef)

R2<-summary(g.final)$r.squared 

N*R2

[1] 10.78813

 (vii) Under the null hypothesis, the errors are homoscedastic, and NR2 is asymptotically distributed as a Chi-squared with k-1 degrees of freedom (where k is the number of coefficients on the auxiliary regression). In this last case, k=13.

And we observe that the test statistic NR2 is near 10.79, while the Chi-squared(12, 5%) is about 21.03, much bigger than the test statistic. Hence, the null hypothesis (homoscedasticity) can not be rejected.

 

Test 2: Breusch-Pagan-Godfrey

The Lagrange Multiplier test proposed by BPG can be executed as follows:

(i) Run the OLS regression (as you've done above, the output is ommited):

g<-lm(lnwage~grade+exp+exp2+union)

(ii) Get the sum of the squared residuals:

g.resid<-g$resid

g.ssr<-sum((g$resid)^2)

g.ssr 

(iii) Generate a disturbance correction factor in the form of sum of the squared residuals divided by the sample size:

 dcf<-g.ssr/((g$df)+length(g$coef))

 

(iv) Regress the adjusted squared errors (in the form of original squared errors divided by the correction fator) on a list of explanatory variables supposed to influence the heteroscedasticity. Following JDN, we will asume that, from the original dataset, only the main variables grade, exp, and union affect the heteroscedasticity. Hence:

adjerr2<-(g.resid^2)/dcf

g.bptest<-lm(adjerr2~grade+exp+union)

summary(g.bptest)

This auxiliary regression gives you a model sum of squares (ESS) equals:

ess<-sum((g.bptest$fitted-mean(adjerr2))^2)      

[1] 10.70477

(vi) Under the null hypoteshis of homoscedasticity, (1/2) ESS asymptotically converges to a Chi-squared(k-1, 5%), where k is the number of coefficients on the auxiliary regression. In the last case, k=4. Hence, comparing (1/2) ESS with a Chi-squared with 3 degress of freedom and 5%, we have:

Hence, the calculated statistic halfESS = 5.35, while the critical value of a Chi-squared (3, 5%) = 7.81. Therefore, the test statistic falls short of the critical value, and the null hypothesis of homoscedasticity can not be rejected. 

 

 

 

 

 

 Last update: September 20, 2007