logo

Applied Econometrics 
Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

Welcome to e-Tutorial, your on-line help to Econ508. This issue provides an introduction to model selection in Econometrics, focusing on Akaike (AIC) and Schwarz (SIC) Information Criteria.1

Data Set

The data set used in this tutorial was borrowed from Johnston and DiNardo's Econometric Methods (1997, 4th ed), but slightly adjusted for your needs. It is called AUTO2. You can download the data by visiting the Econ 508 web site (Data). As you will see, this adapted data set contains five series.

    use AUTO2.dta, clear
list in 1/10
     +-------------------------------------------------------+
| quarter gas price income miles |
|-------------------------------------------------------|
1. | 1959.1 -8.015248 4.67575 -4.50524 2.647592 |
2. | 1959.2 -8.01106 4.691292 -4.492739 2.647592 |
3. | 1959.3 -8.019878 4.689134 -4.498873 2.647592 |
4. | 1959.4 -8.012581 4.722338 -4.491904 2.647592 |
5. | 1960.1 -8.016769 4.70747 -4.490103 2.647415 |
|-------------------------------------------------------|
6. | 1960.2 -7.976376 4.699136 -4.489107 2.647238 |
7. | 1960.3 -7.997135 4.72129 -4.492301 2.647061 |
8. | 1960.4 -8.005725 4.722736 -4.496271 2.646884 |
9. | 1961.1 -8.009368 4.706207 -4.489013 2.648654 |
10. | 1961.2 -7.989948 4.675196 -4.477735 2.650421 |
+-------------------------------------------------------+

As we did before we need to transform the data in “time series” first:

    gen t = _n    
label variable t "Integer time period"
tsset t
        time variable:  t, 1 to 128
delta: 1 unit

Running a Generic Dynamic Models

In the PS2, question 1, for that specific data set (which is different than the one used here) you are asked to run a simple dynamic model in the following autorregressive distributed lag form:

\[gas = a_{0} + a_{1} gas_{t-1} + a_{2} \Delta gas_{t-1} + a_{3} price + a_{4} \Delta price + a_{5} \Delta price_{t-1} + a_{6} income + a_{7} \Delta income + a_{8} \Delta income_{t-1} + \epsilon\]

In STATA, you can run this model as follows:

   regress gas L.gas LD.gas price D.price LD.price income D.income LD.income

      Source |       SS       df       MS              Number of obs =     126
-------------+------------------------------ F( 8, 117) = 863.16
Model | 1.67892182 8 .209865228 Prob > F = 0.0000
Residual | .028446793 117 .000243135 R-squared = 0.9833
-------------+------------------------------ Adj R-squared = 0.9822
Total | 1.70736862 125 .013658949 Root MSE = .01559

------------------------------------------------------------------------------
gas | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gas |
L1. | .9721906 .0254024 38.27 0.000 .9218825 1.022499
LD. | -.1788088 .0907299 -1.97 0.051 -.3584947 .0008771
|
price |
--. | -.0183001 .0096211 -1.90 0.060 -.0373542 .000754
D1. | -.2359339 .0373382 -6.32 0.000 -.3098801 -.1619876
LD. | .0584094 .0445919 1.31 0.193 -.0299024 .1467213
|
income |
--. | .0082806 .0205162 0.40 0.687 -.0323507 .0489119
D1. | .2722332 .1549735 1.76 0.082 -.0346836 .5791501
LD. | .0446936 .1552938 0.29 0.774 -.2628576 .3522449
|
_cons | -.0929674 .12155 -0.76 0.446 -.3336909 .1477561
------------------------------------------------------------------------------

The model above is your benchmark. You should now start your model selection process.

Even when there exist commands to calculate the Akaike or the Schwarz criterion, in Econ 508 it is recommended that you compute them by hand, as taught in class, using the formulae given in Prof. Koenker's Lecture Note 4:

\[AIC=log(\hat{\sigma_j}^2)+\frac{p_i}{n}*2\] \[SIC=log(\hat{\sigma_j}^2)+\frac{p_i}{n}*log(n)\]

IIn STATA, you can calculate various information criteria and other important statistics using functions to extract matrices and scalars generated by the regression operation:

Sample size:
after regress, type
scalar A =
_result(1)
or
e(N)
Model SS:
after regress, type scalar B =
_result(2)
or e(mss)
Model DF:
after regress, type scalar C =
_result(3)
or e(df_m)
Residual SS:
after regress, type scalar D =
_result(4)
or e(rss)
Residual df:
after regress, type scalar E =
_result(5)
or e(df_r)
F-Statistic:
after regress, type scalar F =
_result(6)
or e(F)
R-Squared:
after regress, type scalar G =
_result(7)
or e(r2)
Adj. R-Squared:
after regress, type scalar H =
_result(8)
or e(r2_a)
Root MSE:
after regress, type scalar I =
_result(9)
or
e(rmse)
Coefficients:
after regress, type matrix b =
get(_b)
or
e(b)
# of parameters:
after getting the matrix b, type
scalar K =
colsof(b)


Covariance matrix:
after regress, type matrix v =
get(VCE)
or
e(V)

It is not necessary to memorize these. You can always obtain the name for the saved output by typing "ret list" and/or "eret list" after the regression (or summarize).
    scalar list A B C D E F G H I K
         A =        126
B = 1.6789218
C = 8
D = .02844679
E = 117
F = 863.16345
G = .98333881
H = .98219958
I = .01559279
K = 9
    matrix list b
 b[1,9]
L. LD. D. LD.
gas gas price price price income
y1 .9721906 -.17880883 -.01830014 -.23593387 .05840944 .0082806
    matrix list v
                     L.         LD.                      D.         LD.            
gas gas price price price income
L.gas .00064528
LD.gas -.00044148 .00823192
price .0000944 7.738e-06 .00009257
D.price -.00013958 .00013756 -.00003085 .00139414
LD.price -.00013149 .00182434 -.00005647 -.00046606 .00198843
income -.00045971 .00041162 -.00008305 .00008961 .00015336 .00042091
D.income .00048852 -.00068779 .00018875 .00096361 .00057653 -.00023978
LD.income .00047985 -.00230927 .00018465 -.00049299 .00089038 -.00024082
_cons .00263343 -.00174625 -.00005413 -.0005658 -.00012155 -.00141118


You can get the Akaike Information Criterion as follows:
    scalar AIC=log(_result(4)/_result(1))+(colsof(b)/_result(1))*2 
scalar list AIC
       AIC = -8.2531446
You can get the Schwarz Information Criterion as follows:
    scalar SIC=log(_result(4)/_result(1))+(colsof(b)/_result(1))*log(_result(1))
scalar list SIC
       SIC = -8.0505531

Programming in STATA, I:

How to Obtain Information Criteria

To help you on the model selection, copy and paste the following code in the do-file editor:
 ***************************START HERE**********************************************
* A small do-file to calculate AIC and SIC in STATA
* use "AUTO2.dta", clear
* gen t=_n
* label variable t "Integer time period"
* tsset t
*
* Model 1.1: Full Model
regress gas L.gas LD.gas price D.price LD.price income D.income LD.income
matrix b1=e(b)
scalar AIC1=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC1=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.2: Drop LD.income
regress gas L.gas LD.gas price D.price LD.price income D.income
matrix b2=e(b)
scalar AIC2=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC2=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.3: Drop LD.price
regress gas L.gas LD.gas price D.price income D.income LD.income
matrix b3=e(b)
scalar AIC3=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC3=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.4: Drop LD.gas
regress gas L.gas price D.price LD.price income D.income LD.income
matrix b4=e(b)
scalar AIC4=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC4=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.5: Drop LD.price, LD.income
regress gas L.gas LD.gas price D.price income D.income
matrix b5=e(b)
scalar AIC5=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC5=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.6: Drop LD.gas, LD.income
regress gas L.gas price D.price LD.price income D.income
matrix b6=e(b)
scalar AIC6=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC6=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.7: Drop LD.gas, LD.price
regress gas L.gas price D.price income D.income LD.income
matrix b7=e(b)
scalar AIC7=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC7=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.8: Drop LD.gas, LD.price, LD.income
regress gas L.gas price D.price income D.income
matrix b8=e(b)
scalar AIC8=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC8=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.9: Drop LD.gas, LD.price, D.income, LD.income
regress gas L.gas price D.price income
matrix b9=e(b)
scalar AIC9=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC9=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.10: Drop LD.gas, D.price, LD.price, LD.income
regress gas L.gas price income D.income
matrix b10=e(b)
scalar AIC10=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC10=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.11: Drop LD.gas, D.price, LD.price, D.income, LD.income
regress gas L.gas price income
matrix b11=e(b)
scalar AIC11=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC11=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.12: Drop all lags and differences
regress gas price income
matrix b12=e(b)
scalar AIC12=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar SIC12=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* List all calculated AICs and SICs
scalar list
clear
*****************************END HERE**********************************************
     SIC12 =  -5.408565
AIC12 = -5.6090984
SIC11 = -7.6982209
AIC11 = -7.8997774
SIC10 = -7.7433843
AIC10 = -7.9449408
SIC9 = -7.9607721
AIC9 = -8.1623287
SIC8 = -7.9758284
AIC8 = -8.177385
SIC7 = -7.9689245
AIC7 = -8.1715161
SIC6 = -8.0178852
AIC6 = -8.2204768
SIC5 = -8.0358729
AIC5 = -8.2384645
SIC4 = -8.0178958
AIC4 = -8.2204873
SIC3 = -8.035995
AIC3 = -8.2385865
SIC2 = -8.0498454
AIC2 = -8.2524369
SIC1 = -8.0505531
AIC1 = -8.2531446

  1. Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu

<!-- dynamically load mathjax for compatibility with --self-contained -->