Applied econometrics: Econ 508

## TA: Nicolas Bottan

Welcome to e-Tutorial, your on-line help to Econ508. This issue provides an introduction to model selection in Econometrics, focusing on Akaike (AIC) and Schwarz (SIC) Information Criteria.1

# Data Set

The data set used in this tutorial was borrowed from Johnston and DiNardo’s Econometric Methods (1997, 4th ed), but slightly adjusted for your needs. It is called AUTO2. You can download the data by visiting the Econ 508 web site (Data). As you will see, this adapted data set contains five series.

    auto<-read.table("http://www.econ.uiuc.edu/~econ508/data/AUTO2.txt",header=T)
head(auto)
  quarter    gas price income miles
1    1959 -8.015 4.676 -4.505 2.648
2    1959 -8.011 4.691 -4.493 2.648
3    1959 -8.020 4.689 -4.499 2.648
4    1959 -8.013 4.722 -4.492 2.648
5    1960 -8.017 4.707 -4.490 2.647
6    1960 -7.976 4.699 -4.489 2.647

As we did before we need to transform the data in “time series” first:

    library(dyn)
Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

as.Date, as.Date.numeric
    gas<-ts(auto$gas,start=1959,frequency=4) price<-ts(auto$price,start=1959,frequency=4)
income<-ts(auto$income,start=1959,frequency=4) miles<-ts(auto$miles,start=1959,frequency=4)

# Running a Generic Dynamic Models

In the PS2, question 1, for that specific data set (which is different than the one used here) you are asked to run a simple dynamic model in the following autorregressive distributed lag form:

$gas = a_{0} + a_{1} gas_{t-1} + a_{2} \Delta gas_{t-1} + a_{3} price + a_{4} \Delta price + a_{5} \Delta price_{t-1} + a_{6} income + a_{7} \Delta income + a_{8} \Delta income_{t-1} + \epsilon$

   f<-dyn$lm(gas~lag(gas,-1)+lag(diff(gas),-1)+price+diff(price)+lag(diff(price),-1)+income+diff(income)+lag(diff(income),-1)) summary(f)  Call: lm(formula = dyn(gas ~ lag(gas, -1) + lag(diff(gas), -1) + price + diff(price) + lag(diff(price), -1) + income + diff(income) + lag(diff(income), -1))) Residuals: Min 1Q Median 3Q Max -0.07405 -0.00755 0.00048 0.00775 0.04586 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.09297 0.12155 -0.76 0.446 lag(gas, -1) 0.97219 0.02540 38.27 < 2e-16 *** lag(diff(gas), -1) -0.17881 0.09073 -1.97 0.051 . price -0.01830 0.00962 -1.90 0.060 . diff(price) -0.23593 0.03734 -6.32 4.9e-09 *** lag(diff(price), -1) 0.05841 0.04459 1.31 0.193 income 0.00828 0.02052 0.40 0.687 diff(income) 0.27223 0.15497 1.76 0.082 . lag(diff(income), -1) 0.04469 0.15529 0.29 0.774 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.0156 on 117 degrees of freedom (3 observations deleted due to missingness) Multiple R-squared: 0.983, Adjusted R-squared: 0.982 F-statistic: 863 on 8 and 117 DF, p-value: <2e-16 The model above is your benchmark. You should now start your model selection process. Even when there exist commands to calculate the Akaike or the Schwarz criterion, in Econ 508 it is recommended that you compute them by hand, as taught in class, using the formulae given in Prof. Koenker’s Lecture Note 4: $AIC=log(\hat{\sigma_j}^2)+\frac{p_i}{n}*2$ $SIC=log(\hat{\sigma_j}^2)+\frac{p_i}{n}*log(n)$ In order to calculate the sample size, one could sum the degrees of freedom and the number of estimated parameters, so we have the following:  sample.size<-f$df + length(f$coeff) Finally:  aic<-log(sum(resid(f)^2)/sample.size)+(length(f$coeff)/sample.size)*2
aic
[1] -8.253
    bic<-log(sum(resid(f)^2)/sample.size)+(length(f$coeff)/sample.size)*log(sample.size) bic [1] -8.051 Next, we just need to compute variations of this model and the respective aic, sic, or both. To do so it may be best to build your own AIC and BIC function:  aic<-function(model){ n<-model$df + length(model$coeff) log(sum(resid(model)^2)/n)+(length(model$coeff)/n)*2
}
aic(f)
[1] -8.253
    bic<-function(model){
n<-model$df + length(model$coeff)
log(sum(resid(model)^2)/n)+(length(model$coeff)/n)*log(n) } bic(f) [1] -8.051 Next we specify the models we want to assess  m1 <- gas ~ lag(gas,-1) m2 <- gas ~ lag(gas,-1) + price m3 <- gas ~ lag(gas,-1) + price + diff(gas) m4 <- gas ~ lag(gas,-1) + price + diff(gas) + lag(diff(gas),-1) And then we build a loop that calculates and saves the AIC and BIC results in a matrix  M <- c(m1,m2,m3,m4) A <- array(0,c(4,2)) for(i in 1:4){ g <- dyn$lm(M[[i]])
A[i,1] <- aic(g)
A[i,2] <- bic(g)
}

then we check which model minimizes the criterion functions. The which.min() function “determines the location, i.e., index of the (first) minimum or maximum of a numeric (or logical) vector.”(see help for which.min function)

    A
        [,1]    [,2]
[1,]  -7.959  -7.914
[2,]  -7.975  -7.908
[3,] -68.422 -68.332
[4,] -69.066 -68.954
    aic_min<-which.min(A[,1])
bic_min<-which.min(A[,2])

model_aic<-dyn$lm(eval(parse(text=paste("m",aic_min,sep="")))) summary(model_aic) Warning: essentially perfect fit: summary may be unreliable  Call: lm(formula = dyn(eval(parse(text = paste("m", aic_min, sep = ""))))) Residuals: Min 1Q Median 3Q Max -1.06e-14 -5.50e-17 4.00e-18 2.01e-16 5.91e-16 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.14e-14 6.32e-15 1.80e+00 0.074 . lag(gas, -1) 1.00e+00 7.57e-16 1.32e+15 <2e-16 *** price 2.27e-16 5.53e-16 4.10e-01 0.683 diff(gas) 1.00e+00 4.87e-15 2.05e+14 <2e-16 *** lag(diff(gas), -1) 7.05e-16 4.84e-15 1.50e-01 0.884 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 9.86e-16 on 121 degrees of freedom (3 observations deleted due to missingness) Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 4.39e+29 on 4 and 121 DF, p-value: <2e-16  model_bic<-dyn$lm(eval(parse(text=paste("m",bic_min,sep=""))))
summary(model_bic)
Warning: essentially perfect fit: summary may be unreliable

Call:
lm(formula = dyn(eval(parse(text = paste("m", bic_min, sep = "")))))

Residuals:
Min        1Q    Median        3Q       Max
-1.06e-14 -5.50e-17  4.00e-18  2.01e-16  5.91e-16

Coefficients:
Estimate Std. Error  t value Pr(>|t|)
(Intercept)        1.14e-14   6.32e-15 1.80e+00    0.074 .
lag(gas, -1)       1.00e+00   7.57e-16 1.32e+15   <2e-16 ***
price              2.27e-16   5.53e-16 4.10e-01    0.683
diff(gas)          1.00e+00   4.87e-15 2.05e+14   <2e-16 ***
lag(diff(gas), -1) 7.05e-16   4.84e-15 1.50e-01    0.884
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.86e-16 on 121 degrees of freedom
(3 observations deleted due to missingness)
Multiple R-squared:     1,  Adjusted R-squared:     1
F-statistic: 4.39e+29 on 4 and 121 DF,  p-value: <2e-16

The last lines run the model for which the AIC and BIC are minimized.

As we previously mentioned most of functions are already programmed in R, that is the case with AIC and BIC. To obtain them you need to estimate a model and then call the function

    n <- length(fitted(f))
AIC(f,k=2)
[1] -680.3
    BIC(f)
[1] -652

Note that the number given by the AIC/BIC built in function is different than the programmed by ourselves, to check why you should check how the built in function was created. You can do that by typing ?AIC.

1. Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu