Welcome to e-Tutorial, your on-line help to Econ508. This issue provides an introduction to model selection in Econometrics, focusing on Akaike (AIC) and Schwarz (SIC) Information Criteria.¹

Data Set

The data set used in this tutorial was borrowed from Johnston and DiNardo's Econometric Methods (1997, 4th ed), but slightly adjusted for your needs. It is called AUTO2. You can download the data by visiting the Econ 508 web site (Data). As you will see, this adapted data set contains five series.

    use AUTO2.dta, clear
    list in 1/10

     +-------------------------------------------------------+
     | quarter         gas      price      income      miles |
     |-------------------------------------------------------|
  1. |  1959.1   -8.015248    4.67575    -4.50524   2.647592 |
  2. |  1959.2    -8.01106   4.691292   -4.492739   2.647592 |
  3. |  1959.3   -8.019878   4.689134   -4.498873   2.647592 |
  4. |  1959.4   -8.012581   4.722338   -4.491904   2.647592 |
  5. |  1960.1   -8.016769    4.70747   -4.490103   2.647415 |
     |-------------------------------------------------------|
  6. |  1960.2   -7.976376   4.699136   -4.489107   2.647238 |
  7. |  1960.3   -7.997135    4.72129   -4.492301   2.647061 |
  8. |  1960.4   -8.005725   4.722736   -4.496271   2.646884 |
  9. |  1961.1   -8.009368   4.706207   -4.489013   2.648654 |
 10. |  1961.2   -7.989948   4.675196   -4.477735   2.650421 |
     +-------------------------------------------------------+

As we did before we need to transform the data in “time series” first:

    gen t = _n    
    label variable t "Integer time period"
    tsset t

        time variable:  t, 1 to 128
                delta:  1 unit

Running a Generic Dynamic Models

In the PS2, question 1, for that specific data set (which is different than the one used here) you are asked to run a simple dynamic model in the following autorregressive distributed lag form:

\[gas = a_{0} + a_{1} gas_{t-1} + a_{2} \Delta gas_{t-1} + a_{3} price + a_{4} \Delta price + a_{5} \Delta price_{t-1} + a_{6} income + a_{7} \Delta income + a_{8} \Delta income_{t-1} + \epsilon\]

In STATA, you can run this model as follows:

   regress gas L.gas LD.gas price D.price LD.price income D.income LD.income


      Source |       SS       df       MS              Number of obs =     126
-------------+------------------------------           F(  8,   117) =  863.16
       Model |  1.67892182     8  .209865228           Prob > F      =  0.0000
    Residual |  .028446793   117  .000243135           R-squared     =  0.9833
-------------+------------------------------           Adj R-squared =  0.9822
       Total |  1.70736862   125  .013658949           Root MSE      =  .01559

------------------------------------------------------------------------------
         gas |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         gas |
         L1. |   .9721906   .0254024    38.27   0.000     .9218825    1.022499
         LD. |  -.1788088   .0907299    -1.97   0.051    -.3584947    .0008771
             |
       price |
         --. |  -.0183001   .0096211    -1.90   0.060    -.0373542     .000754
         D1. |  -.2359339   .0373382    -6.32   0.000    -.3098801   -.1619876
         LD. |   .0584094   .0445919     1.31   0.193    -.0299024    .1467213
             |
      income |
         --. |   .0082806   .0205162     0.40   0.687    -.0323507    .0489119
         D1. |   .2722332   .1549735     1.76   0.082    -.0346836    .5791501
         LD. |   .0446936   .1552938     0.29   0.774    -.2628576    .3522449
             |
       _cons |  -.0929674     .12155    -0.76   0.446    -.3336909    .1477561
------------------------------------------------------------------------------

The model above is your benchmark. You should now start your model selection process.

Even when there exist commands to calculate the Akaike or the Schwarz criterion, in Econ 508 it is recommended that you compute them by hand, as taught in class, using the formulae given in Prof. Koenker's Lecture Note 4:

\[AIC=log(\hat{\sigma_j}^2)+\frac{p_i}{n}*2\] \[SIC=log(\hat{\sigma_j}^2)+\frac{p_i}{n}*log(n)\]

IIn STATA, you can calculate various information criteria and other important statistics using functions to extract matrices and scalars generated by the regression operation:

Sample size:	after regress, type	scalar A =	_result(1)	or	e(N)
Model SS:	after regress, type	scalar B =	_result(2)	or	e(mss)
Model DF:	after regress, type	scalar C =	_result(3)	or	e(df_m)
Residual SS:	after regress, type	scalar D =	_result(4)	or	e(rss)
Residual df:	after regress, type	scalar E =	_result(5)	or	e(df_r)
F-Statistic:	after regress, type	scalar F =	_result(6)	or	e(F)
R-Squared:	after regress, type	scalar G =	_result(7)	or	e(r2)
Adj. R-Squared:	after regress, type	scalar H =	_result(8)	or	e(r2_a)
Root MSE:	after regress, type	scalar I =	_result(9)	or	e(rmse)
Coefficients:	after regress, type	matrix b =	get(_b)	or	e(b)
# of parameters:	after getting the matrix b, type	scalar K =	colsof(b)
Covariance matrix:	after regress, type	matrix v =	get(VCE)	or	e(V)

It is not necessary to memorize these. You can always obtain the name for the saved output by typing "ret list" and/or "eret list" after the regression (or summarize).

    scalar list A B C D E F G H I K

         A =        126
         B =  1.6789218
         C =          8
         D =  .02844679
         E =        117
         F =  863.16345
         G =  .98333881
         H =  .98219958
         I =  .01559279
         K =          9

    matrix list b

 b[1,9]
             L.         LD.                      D.         LD.               
           gas         gas       price       price       price      income    
y1    .9721906  -.17880883  -.01830014  -.23593387   .05840944    .0082806

    matrix list v

                     L.         LD.                      D.         LD.            
                  gas         gas       price       price       price      income  
    L.gas   .00064528
   LD.gas  -.00044148   .00823192
    price    .0000944   7.738e-06   .00009257
  D.price  -.00013958   .00013756  -.00003085   .00139414
 LD.price  -.00013149   .00182434  -.00005647  -.00046606   .00198843
   income  -.00045971   .00041162  -.00008305   .00008961   .00015336   .00042091
 D.income   .00048852  -.00068779   .00018875   .00096361   .00057653  -.00023978  
LD.income   .00047985  -.00230927   .00018465  -.00049299   .00089038  -.00024082  
    _cons   .00263343  -.00174625  -.00005413   -.0005658  -.00012155  -.00141118

You can get the Akaike Information Criterion as follows:

    scalar AIC=log(_result(4)/_result(1))+(colsof(b)/_result(1))*2 
    scalar list AIC

       AIC = -8.2531446

You can get the Schwarz Information Criterion as follows:

    scalar SIC=log(_result(4)/_result(1))+(colsof(b)/_result(1))*log(_result(1))
    scalar list SIC

       SIC = -8.0505531

Programming in STATA, I:

How to Obtain Information Criteria

To help you on the model selection, copy and paste the following code in the do-file editor:

 ***************************START HERE**********************************************
* A small do-file to calculate AIC and SIC in STATA
* use "AUTO2.dta", clear
* gen t=_n
* label variable t "Integer time period"
* tsset t
*
* Model 1.1: Full Model
regress  gas  L.gas  LD.gas  price  D.price  LD.price  income  D.income  LD.income
matrix   b1=e(b)
scalar   AIC1=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC1=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.2: Drop LD.income
regress  gas  L.gas  LD.gas  price  D.price  LD.price  income  D.income
matrix   b2=e(b)
scalar   AIC2=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC2=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.3: Drop LD.price
regress  gas  L.gas  LD.gas  price  D.price income  D.income  LD.income
matrix   b3=e(b)
scalar   AIC3=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC3=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.4: Drop LD.gas
regress  gas  L.gas price  D.price  LD.price  income  D.income  LD.income
matrix   b4=e(b)
scalar   AIC4=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC4=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.5: Drop LD.price, LD.income
regress  gas  L.gas  LD.gas  price  D.price income  D.income 
matrix   b5=e(b)
scalar   AIC5=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC5=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.6: Drop LD.gas, LD.income
regress  gas  L.gas price  D.price  LD.price  income  D.income 
matrix   b6=e(b)
scalar   AIC6=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC6=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.7: Drop LD.gas, LD.price
regress  gas  L.gas price  D.price income  D.income  LD.income
matrix   b7=e(b)
scalar   AIC7=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC7=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.8: Drop LD.gas, LD.price, LD.income
regress  gas  L.gas price  D.price income  D.income
matrix   b8=e(b)
scalar   AIC8=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC8=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.9: Drop LD.gas, LD.price, D.income, LD.income
regress  gas  L.gas price  D.price income 
matrix   b9=e(b)
scalar   AIC9=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC9=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.10: Drop LD.gas, D.price, LD.price, LD.income 
regress  gas  L.gas price income  D.income
matrix   b10=e(b)
scalar   AIC10=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC10=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.11: Drop LD.gas, D.price, LD.price, D.income, LD.income 
regress  gas L.gas price income 
matrix   b11=e(b)
scalar   AIC11=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC11=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* Model 1.12: Drop all lags and differences
regress  gas price income 
matrix   b12=e(b)
scalar   AIC12=log(e(rss)/e(N))+(colsof(b1)/e(N))*2
scalar   SIC12=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))
*
* List all calculated AICs and SICs
scalar list
clear
*****************************END HERE**********************************************

     SIC12 =  -5.408565
     AIC12 = -5.6090984
     SIC11 = -7.6982209
     AIC11 = -7.8997774
     SIC10 = -7.7433843
     AIC10 = -7.9449408
      SIC9 = -7.9607721
      AIC9 = -8.1623287
      SIC8 = -7.9758284
      AIC8 =  -8.177385
      SIC7 = -7.9689245
      AIC7 = -8.1715161
      SIC6 = -8.0178852
      AIC6 = -8.2204768
      SIC5 = -8.0358729
      AIC5 = -8.2384645
      SIC4 = -8.0178958
      AIC4 = -8.2204873
      SIC3 =  -8.035995
      AIC3 = -8.2385865
      SIC2 = -8.0498454
      AIC2 = -8.2524369
      SIC1 = -8.0505531
      AIC1 = -8.2531446

Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu

Contact	Office Hours	E-mail
Prof. Roger Koenker	M. & W. 2:30-3:30 or by appointment (126 DKH)	rkoenker@illinois.edu
TA Nicolas Bottan	TBA	bottan2@illinois.edu

Applied Econometrics
Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

e-TA 4: Model Selection and Information Criteria

Data Set

Running a Generic Dynamic Models

Programming in STATA, I:

How to Obtain Information Criteria

Contact

Office Hours

E-mail

Applied Econometrics Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

e-TA 4: Model Selection and Information Criteria

Data Set

Running a Generic Dynamic Models

Programming in STATA, I:

How to Obtain Information Criteria

Contact

Office Hours

E-mail

Applied Econometrics
Econ 508 - Fall 2014