Applied econometrics: Econ 508
logo

Applied Econometrics 
Econ 508 - Fall 2014

Professor: Roger Koenker 

TA: Nicolas Bottan 

Welcome to this new issue of e-Tutorial. We focus now on time series models, with special emphasis on the tests of unit roots and cointegration. We would like to remark that the theoretical background given in class is essential to proceed with the computational exercise below. Thus, I recommend you to study Prof. Koenker’s Lectures 8 and 9 as you go through the tutorial.1

Data

The first thing you need is to download the updated Thurman and Fisher (1988) data, called eggs.csv from the Econ 508 web site. Save it in your preferred directory (I will save my as “C:/eggs.csv”.) The next step is inserting the Data in R:

  Thurman<-read.table("C:/eggs1.txt", header=T, sep=",")

An alternative is to call your data from the web:

  Thurman<-read.table("http://www.econ.uiuc.edu/~econ508/data/eggs.csv", header=T, sep=",")

The next step is to declare chickens and eggs as time series:

  year<-ts(Thurman$year) 
  chic<-ts(Thurman$chic) 
  egg<-ts(Thurman$egg) 

Unit Root: Augmented Dickey-Fuller Test

At first, it is important that you to sketch the ADF test, explaining the NULL and the ALTERNATIVE hypotheses.

ADF Test in R: I suggest you to use the R code adf.R, available at http://www.econ.illinois.edu/~econ508/routines.html:

"adf" <- function(x,k = 0, int = TRUE, trend = FALSE){
# NB:  returns conventional lm summary so p-values for adf test are wrong!
    require(dynlm)
    dx <- diff(x)
    formula <- paste("dx ~ L(x)")
    if(k > 0)
        formula <- paste(formula," + L(dx,1:k)")
    if(trend){
        s <- time(x)
        t <- ts(s - s[1],start = s[1],freq = frequency(x))
        formula <- paste(formula," + t")
        }
    if(!int) formula <- paste(formula," - 1")
    summary(dynlm(as.formula(formula)))
    }
} 

Your job is to copy the R code above and paste in the R console. This will create a R function called “adf”, which runs the unit root test for each case. You should use the ADF test for each individual series (chickens and eggs), controlling for the number of lags, and the inclusion of constants and trends.

If you don’t feel like downloading it and doing the copy pasting you can directly source it from the web page

  source("http://www.econ.uiuc.edu/~econ508/routines/adf.R")

Examples DF for Chickens

  • Models including constant and trend: For example, using 1 lag in the chicken series:
  adf(chic, k=1, int=T, trend=T) 
Loading required package: dynlm
Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Time series regression with "ts" data:
Start = 3, End = 75

Call:
dynlm(formula = as.formula(formula))

Residuals:
   Min     1Q Median     3Q    Max 
-55584 -10044   1244   8846  77813 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  4.71e+04   3.10e+04    1.52    0.133  
L(x)        -1.14e-01   6.81e-02   -1.68    0.098 .
L(dx, 1:k)  -9.76e-02   1.22e-01   -0.80    0.428  
t           -2.08e-01   1.38e+02    0.00    0.999  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22000 on 69 degrees of freedom
Multiple R-squared:  0.0715,    Adjusted R-squared:  0.0312 
F-statistic: 1.77 on 3 and 69 DF,  p-value: 0.161

Then you can test the significance of the coefficient L(x) by using the appropriate Dickey & Fuller critical values (Table B.6 from Hamilton 1994). You can access the DF Test tables given by Hamilton(1994) by clicking HERE.

Here the null hypothesis is the presence of unit root. Thus, the augmented Dickey-Fuller statistic is -1.678, and lies inside the acceptance region at 1%, 5%, and 10%, as you can see form the tables. Therefore, we cannot reject the presence of unit root.

If you don’t want to use the tables, there’s a package in R called fUnitRoots that gives you the DFtable

  install.packages("fUnitRoots")
  require(fUnitRoots)

The qadf function will give you the quantiles for the ADF test

  qadf(0.01, N=75, trend="ct")

From this starting point, you can add lags by changing k=1 to k=2 or k=3 or k=4 and so on. If wish to exclude the intercept, just substitute int=T by int=F. (As usual, T means true, i.e., inclusion, and F means false, i.e., exclusion). The same applies to the inclusion/exclusion of trend.

My suggestion is that you run 3 different types of ADF, each of them including 1, 2, 3, and 4 lags: (i) Models with intercept and trend (int=T, trend=T) (ii) Models with intercept but without trend (int=T, trend=F) (iii) Models without intercept and without trend (int=F, trend=F)

  • Models including constant but no trend. Same rationale, but adjusting the command to:
  adf(chic, k=1, int=T, trend=F)

Time series regression with "ts" data:
Start = 3, End = 75

Call:
dynlm(formula = as.formula(formula))

Residuals:
   Min     1Q Median     3Q    Max 
-55583 -10040   1240   8838  77817 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  4.71e+04   2.48e+04    1.90    0.062 .
L(x)        -1.14e-01   5.99e-02   -1.91    0.061 .
L(dx, 1:k)  -9.76e-02   1.18e-01   -0.83    0.411  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21800 on 70 degrees of freedom
Multiple R-squared:  0.0715,    Adjusted R-squared:  0.045 
F-statistic:  2.7 on 2 and 70 DF,  p-value: 0.0745
  • Models excluding both constant and trend. Idem, but adjusting the command to:
  adf(chic, k=1, int=F, trend=F)

Time series regression with "ts" data:
Start = 3, End = 75

Call:
dynlm(formula = as.formula(formula))

Residuals:
   Min     1Q Median     3Q    Max 
-63272  -8100   1536   9986  73866 

Coefficients:
           Estimate Std. Error t value Pr(>|t|)
L(x)       -0.00116    0.00628   -0.19     0.85
L(dx, 1:k) -0.15157    0.11675   -1.30     0.20

Residual standard error: 22200 on 71 degrees of freedom
Multiple R-squared:  0.0237,    Adjusted R-squared:  -0.00376 
F-statistic: 0.863 on 2 and 71 DF,  p-value: 0.426

Do that for each individual series. This will generate 12 regressions for chickens, and 12 for eggs. Very likely, some of them will indicate the presence of unit root, while others will not. The choice of the best model can be done by calculating AIC, SIC or any other reasonable criterion. At the end, please provide a table with the summary of your results, and draw your conclusions.

After performing the test on the three models, what can you conclude?

The adf function gave you the tests for the chickens annual series, using 1 lag. We recommend you to repeat these 3 processes for lags 2,3,and 4 as well. After you complete this cycle for chickens, you need to do the same cycle for eggs. At the end of both cycles, you will have 24 regression outputs. If you prefer, you don’t need to report all output details, but rather concentrate on the ADF test statistics of each equation. Think that you are writing an academic paper. Don’t spend too much space with intermediary results; concentrate instead on your final conclusions, which can be paradoxical as you go through different tetsting steps. By the end of the day you are expected to summarize your main results in a table, and then to write a paragraph with comments on the different results you can obtain when you include/exclude trends/constants/lags for both chickens and eggs series.

Comments on Unit Root Tests:

  • Unit root tests are very sensitive to the number of included lags and/or constant and trends. That’s the reason by which we are asking you to show all ADF statistics in the table above. Very likely, some of the results will indicate the presence of unit root while others will not.

  • How to make a general conclusion on the test results with so many models available? Johnston & DiNardo (1997, p.226), for example, mention that one of the objectives of including lags is to achieve white noise residuals. Other authors recommend the use AIC or SIC in the model selection.

  • It is quite simple to calculate information criteria in ADF tests. Each output of adf corresponds to a linear regression on the lags, constant, and/or trend of the series. From OLS regression, you recover the sample size, the RSS, and the # of parameters requested to calculate SIC or AIC, plus the original ADF statistic. But remember to use the Dickey-Fuller critical values.

Cointegration: Engle-Granger Test

The first thing you should do always is to sketch the Engle-Granger test, explaining the NULL and the ALTERNATIVE hypotheses. :

Engle-Granger in R: The test can be done in 3 steps, as follows:

  1. Pre-test the variables for the presence of unit roots (done above) and check if they are integrated of the same order

  2. Regress the long run equilibrium model of chickens vs. eggs

  Engle<-lm(chic~egg) 
  summary(Engle)

Call:
lm(formula = chic ~ egg)

Residuals:
   Min     1Q Median     3Q    Max 
-57843 -30963 -15177  18559 169232 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 446387.63   27575.93   16.19   <2e-16 ***
egg             -6.23       5.05   -1.23     0.22    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 44200 on 73 degrees of freedom
Multiple R-squared:  0.0204,    Adjusted R-squared:  0.00696 
F-statistic: 1.52 on 1 and 73 DF,  p-value: 0.222

Obtain the residuals.

  residual<-resid(Engle)

Plot the residuals along time.

  ts.plot(year,residual, gpars=list(main="Chickens vs. eggs: Is there cointegration?", xlab="year", ylab="residuals"))

plot of chunk unnamed-chunk-13

Plot also the residuals versus lagged residuals. Draw your conclusions

  1. Proceed with a unit root test on the residuals, i.e. test whether the residuals are \(I(0)\), as you have done the ADF test for unit roots on chickens and eggs. Consider lags 0 to 4, though. This is a residual-based version of the ADF test. The only difference from the traditional ADF to (this version of) the Engle-Granger test are the critical values. The critical values to be used here are no longer the same provided by Dickey-Fuller, but instead provided by Engle and Yoo (1987) and others (see approximated critical values in Table B.9, Hamilton 1994) HERE. This happens because the residuals above are not the actual error terms, but estimated values from the long run equilibrium equation of chickens against eggs.

Some authors (e.g., Enders, 1995) consider a fourth step, consisting in the estimation of error-correction models and checking of models adequacy. However, you are not required to do that for the purposes of the problem set 3.

At the end of the test, please provide a table summarizing your results. Comment your findings.

Cointegration: Johansen Test

Again we recommend you to sketch the Johansen test, explaining the NULL and the ALTERNATIVE hypotheses. Then we suggest you to use the R code johansen.R, provided by Prof. Koenker, and available at http://www.econ.uiuc.edu/~econ472/routines.html:

"johansen"<- function(x, L = 2){ 
  #Johansen Test of cointegration for multivariate time series x 
  #Returns vector of eigenvalues after that you are on your own. 
  #This is a modified version for R, in which rts is substituted by ts. 
        x <- ts(x) 
        n <- nrow(x) 
        p <- ncol(x) 
        Ly <- lag(x[, 1], -1) 
        D <- diff(x[, 1]) 
        for(i in 1:p) { 
                if(i > 1) { 
                        D <- ts.intersect(D, diff(x[, i])) 
                        Ly <- ts.intersect(Ly, lag(x[, i], -1)) 
                } 
                if(L > 0) 
                        for(j in 1:L) 
                                D <- ts.intersect(D, lag(diff(x[, i]),  - j)) 
        } 
        iys <- 1 + (L + 1) * (0:(p - 1)) 
        Y <- D[, iys] 
        X <- D[,  - iys] 
        Ly <- ts.intersect(Ly, D)[, 1:p] 
        ZD <- lm(Y ~ X)$resid 
        ZL <- lm(Ly ~ X)$resid 
        df <- nrow(X) - ncol(X) - 1 
        S00 <- crossprod(ZD)/df 
        S11 <- crossprod(ZL)/df 
        S01 <- crossprod(ZD, ZL)/df 
        M <- solve(S11) %*% t(S01) %*% solve(S00) %*% S01 
        eigen(M)$values 
} 

Your job is to copy the code above and paste in the R console. This will create a R function called “johansen” that calculates the eigenvalues.

Once again, if you don’t feel like downloading it and doing the copy pasting you can directly source it from the web page

  source("http://www.econ.uiuc.edu/~econ508/routines/johansen.R")

The command to obtain the eigenvalues is:

johansen(cbind(egg,chic), L=1) 
[1] 0.14143 0.01169

The code above refers to the case including trend and intercept, and the appropriate critical values should be used. Note that the theoretical background here is essential, given that you need to interpret the eigenvalues and calculate the test statistic by yourself, before to draw your conclusions.


  1. Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu