# e-TA 8: Unit Roots and Cointegration

Welcome to this new issue of e-Tutorial. We focus now on time series models, with special emphasis on the tests of unit roots and cointegration. We would like to remark that the theoretical background given in class is essential to proceed with the computational exercise below. Thus, I recommend you to study Prof. Koenker’s Lectures 8 and 9 as you go through the tutorial.^{1}

# Data

The first thing you need is to download the updated Thurman and Fisher (1988) data, called eggs.csv from the Econ 508 web site. Save it in your preferred directory (I will save my as “C:/eggs.csv”.) The next step is inserting the Data in R:

` Thurman<-read.table("C:/eggs1.txt", header=T, sep=",")`

An alternative is to call your data from the web:

` Thurman<-read.table("http://www.econ.uiuc.edu/~econ508/data/eggs.csv", header=T, sep=",")`

The next step is to declare chickens and eggs as time series:

```
year<-ts(Thurman$year)
chic<-ts(Thurman$chic)
egg<-ts(Thurman$egg)
```

# Unit Root: Augmented Dickey-Fuller Test

At first, it is important that you to sketch the ADF test, explaining the NULL and the ALTERNATIVE hypotheses.

ADF Test in R: I suggest you to use the R code adf.R, available at http://www.econ.illinois.edu/~econ508/routines.html:

```
"adf" <- function(x,k = 0, int = TRUE, trend = FALSE){
# NB: returns conventional lm summary so p-values for adf test are wrong!
require(dynlm)
dx <- diff(x)
formula <- paste("dx ~ L(x)")
if(k > 0)
formula <- paste(formula," + L(dx,1:k)")
if(trend){
s <- time(x)
t <- ts(s - s[1],start = s[1],freq = frequency(x))
formula <- paste(formula," + t")
}
if(!int) formula <- paste(formula," - 1")
summary(dynlm(as.formula(formula)))
}
}
```

Your job is to copy the R code above and paste in the R console. This will create a R function called “adf”, which runs the unit root test for each case. You should use the ADF test for each individual series (chickens and eggs), controlling for the number of lags, and the inclusion of constants and trends.

If you don’t feel like downloading it and doing the copy pasting you can directly source it from the web page

` source("http://www.econ.uiuc.edu/~econ508/routines/adf.R")`

Examples DF for Chickens

- Models including constant and trend: For example, using 1 lag in the chicken series:

` adf(chic, k=1, int=T, trend=T) `

```
Loading required package: dynlm
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
```

```
Time series regression with "ts" data:
Start = 3, End = 75
Call:
dynlm(formula = as.formula(formula))
Residuals:
Min 1Q Median 3Q Max
-55584 -10044 1244 8846 77813
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.71e+04 3.10e+04 1.52 0.133
L(x) -1.14e-01 6.81e-02 -1.68 0.098 .
L(dx, 1:k) -9.76e-02 1.22e-01 -0.80 0.428
t -2.08e-01 1.38e+02 0.00 0.999
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 22000 on 69 degrees of freedom
Multiple R-squared: 0.0715, Adjusted R-squared: 0.0312
F-statistic: 1.77 on 3 and 69 DF, p-value: 0.161
```

Then you can test the significance of the coefficient *L(x)* by using the appropriate Dickey & Fuller critical values (Table B.6 from Hamilton 1994). You can access the DF Test tables given by Hamilton(1994) by clicking HERE.

Here the null hypothesis is the presence of unit root. Thus, the augmented Dickey-Fuller statistic is -1.678, and lies inside the acceptance region at 1%, 5%, and 10%, as you can see form the tables. Therefore, we cannot reject the presence of unit root.

If you don’t want to use the tables, there’s a package in R called `fUnitRoots`

that gives you the DFtable

```
install.packages("fUnitRoots")
require(fUnitRoots)
```

The `qadf`

function will give you the quantiles for the ADF test

` qadf(0.01, N=75, trend="ct")`

From this starting point, you can add lags by changing k=1 to k=2 or k=3 or k=4 and so on. If wish to exclude the intercept, just substitute int=T by int=F. (As usual, T means true, i.e., inclusion, and F means false, i.e., exclusion). The same applies to the inclusion/exclusion of trend.

My suggestion is that you run 3 different types of ADF, each of them including 1, 2, 3, and 4 lags: (i) Models with intercept and trend (int=T, trend=T) (ii) Models with intercept but without trend (int=T, trend=F) (iii) Models without intercept and without trend (int=F, trend=F)

- Models including constant but no trend. Same rationale, but adjusting the command to:

` adf(chic, k=1, int=T, trend=F)`

```
Time series regression with "ts" data:
Start = 3, End = 75
Call:
dynlm(formula = as.formula(formula))
Residuals:
Min 1Q Median 3Q Max
-55583 -10040 1240 8838 77817
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.71e+04 2.48e+04 1.90 0.062 .
L(x) -1.14e-01 5.99e-02 -1.91 0.061 .
L(dx, 1:k) -9.76e-02 1.18e-01 -0.83 0.411
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21800 on 70 degrees of freedom
Multiple R-squared: 0.0715, Adjusted R-squared: 0.045
F-statistic: 2.7 on 2 and 70 DF, p-value: 0.0745
```

- Models excluding both constant and trend. Idem, but adjusting the command to:

` adf(chic, k=1, int=F, trend=F)`

```
Time series regression with "ts" data:
Start = 3, End = 75
Call:
dynlm(formula = as.formula(formula))
Residuals:
Min 1Q Median 3Q Max
-63272 -8100 1536 9986 73866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
L(x) -0.00116 0.00628 -0.19 0.85
L(dx, 1:k) -0.15157 0.11675 -1.30 0.20
Residual standard error: 22200 on 71 degrees of freedom
Multiple R-squared: 0.0237, Adjusted R-squared: -0.00376
F-statistic: 0.863 on 2 and 71 DF, p-value: 0.426
```

Do that for each individual series. This will generate 12 regressions for chickens, and 12 for eggs. Very likely, some of them will indicate the presence of unit root, while others will not. The choice of the best model can be done by calculating AIC, SIC or any other reasonable criterion. At the end, please provide a table with the summary of your results, and draw your conclusions.

After performing the test on the three models, what can you conclude?

The `adf`

function gave you the tests for the chickens annual series, using 1 lag. We recommend you to repeat these 3 processes for lags 2,3,and 4 as well. After you complete this cycle for chickens, you need to do the same cycle for eggs. At the end of both cycles, you will have 24 regression outputs. If you prefer, you don’t need to report all output details, but rather concentrate on the ADF test statistics of each equation. Think that you are writing an academic paper. Don’t spend too much space with intermediary results; concentrate instead on your final conclusions, which can be paradoxical as you go through different tetsting steps. By the end of the day you are expected to summarize your main results in a table, and then to write a paragraph with comments on the different results you can obtain when you include/exclude trends/constants/lags for both chickens and eggs series.

# Cointegration: Engle-Granger Test

The first thing you should do always is to sketch the Engle-Granger test, explaining the NULL and the ALTERNATIVE hypotheses. :

Engle-Granger in R: The test can be done in 3 steps, as follows:

Pre-test the variables for the presence of unit roots (done above) and check if they are integrated of the same order

Regress the long run equilibrium model of chickens vs. eggs

```
Engle<-lm(chic~egg)
summary(Engle)
```

```
Call:
lm(formula = chic ~ egg)
Residuals:
Min 1Q Median 3Q Max
-57843 -30963 -15177 18559 169232
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 446387.63 27575.93 16.19 <2e-16 ***
egg -6.23 5.05 -1.23 0.22
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 44200 on 73 degrees of freedom
Multiple R-squared: 0.0204, Adjusted R-squared: 0.00696
F-statistic: 1.52 on 1 and 73 DF, p-value: 0.222
```

Obtain the residuals.

` residual<-resid(Engle)`

Plot the residuals along time.

` ts.plot(year,residual, gpars=list(main="Chickens vs. eggs: Is there cointegration?", xlab="year", ylab="residuals"))`

Plot also the residuals versus lagged residuals. Draw your conclusions

- Proceed with a unit root test on the residuals, i.e. test whether the residuals are \(I(0)\), as you have done the ADF test for unit roots on chickens and eggs. Consider lags 0 to 4, though. This is a residual-based version of the ADF test. The only difference from the traditional ADF to (this version of) the Engle-Granger test are the critical values. The critical values to be used here are no longer the same provided by Dickey-Fuller, but instead provided by Engle and Yoo (1987) and others (see approximated critical values in Table B.9, Hamilton 1994) HERE. This happens because the residuals above are not the actual error terms, but estimated values from the long run equilibrium equation of chickens against eggs.

Some authors (e.g., Enders, 1995) consider a fourth step, consisting in the estimation of error-correction models and checking of models adequacy. However, you are not required to do that for the purposes of the problem set 3.

At the end of the test, please provide a table summarizing your results. Comment your findings.

# Cointegration: Johansen Test

Again we recommend you to sketch the Johansen test, explaining the NULL and the ALTERNATIVE hypotheses. Then we suggest you to use the R code johansen.R, provided by Prof. Koenker, and available at http://www.econ.uiuc.edu/~econ472/routines.html:

```
"johansen"<- function(x, L = 2){
#Johansen Test of cointegration for multivariate time series x
#Returns vector of eigenvalues after that you are on your own.
#This is a modified version for R, in which rts is substituted by ts.
x <- ts(x)
n <- nrow(x)
p <- ncol(x)
Ly <- lag(x[, 1], -1)
D <- diff(x[, 1])
for(i in 1:p) {
if(i > 1) {
D <- ts.intersect(D, diff(x[, i]))
Ly <- ts.intersect(Ly, lag(x[, i], -1))
}
if(L > 0)
for(j in 1:L)
D <- ts.intersect(D, lag(diff(x[, i]), - j))
}
iys <- 1 + (L + 1) * (0:(p - 1))
Y <- D[, iys]
X <- D[, - iys]
Ly <- ts.intersect(Ly, D)[, 1:p]
ZD <- lm(Y ~ X)$resid
ZL <- lm(Ly ~ X)$resid
df <- nrow(X) - ncol(X) - 1
S00 <- crossprod(ZD)/df
S11 <- crossprod(ZL)/df
S01 <- crossprod(ZD, ZL)/df
M <- solve(S11) %*% t(S01) %*% solve(S00) %*% S01
eigen(M)$values
}
```

Your job is to copy the code above and paste in the R console. This will create a R function called “johansen” that calculates the eigenvalues.

Once again, if you don’t feel like downloading it and doing the copy pasting you can directly source it from the web page

` source("http://www.econ.uiuc.edu/~econ508/routines/johansen.R")`

The command to obtain the eigenvalues is:

`johansen(cbind(egg,chic), L=1) `

`[1] 0.14143 0.01169`

The code above refers to the case including trend and intercept, and the appropriate critical values should be used. Note that the theoretical background here is essential, given that you need to interpret the eigenvalues and calculate the test statistic by yourself, before to draw your conclusions.

Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu↩

## Comments on Unit Root Tests:

Unit root tests are very sensitive to the number of included lags and/or constant and trends. That’s the reason by which we are asking you to show all ADF statistics in the table above. Very likely, some of the results will indicate the presence of unit root while others will not.

How to make a general conclusion on the test results with so many models available? Johnston & DiNardo (1997, p.226), for example, mention that one of the objectives of including lags is to achieve white noise residuals. Other authors recommend the use AIC or SIC in the model selection.

It is quite simple to calculate information criteria in ADF tests. Each output of

`adf`

corresponds to a linear regression on the lags, constant, and/or trend of the series. From OLS regression, you recover the sample size, the RSS, and the # of parameters requested to calculate SIC or AIC, plus the original ADF statistic. But remember to use the Dickey-Fuller critical values.