## TA: Nicolas Bottan

Welcome to e-Tutorial, your on-line help to Econ508. This issue provides an introduction to model selection in Econometrics, focusing on Akaike (AIC) and Schwarz (SIC) Information Criteria.1

# Data Set

The data set used in this tutorial was borrowed from Johnston and DiNardo's Econometric Methods (1997, 4th ed), but slightly adjusted for your needs. It is called AUTO2. You can download the data by visiting the Econ 508 web site (Data). As you will see, this adapted data set contains five series.

    use AUTO2.dta, clear    list in 1/10
     +-------------------------------------------------------+     | quarter         gas      price      income      miles |     |-------------------------------------------------------|  1. |  1959.1   -8.015248    4.67575    -4.50524   2.647592 |  2. |  1959.2    -8.01106   4.691292   -4.492739   2.647592 |  3. |  1959.3   -8.019878   4.689134   -4.498873   2.647592 |  4. |  1959.4   -8.012581   4.722338   -4.491904   2.647592 |  5. |  1960.1   -8.016769    4.70747   -4.490103   2.647415 |     |-------------------------------------------------------|  6. |  1960.2   -7.976376   4.699136   -4.489107   2.647238 |  7. |  1960.3   -7.997135    4.72129   -4.492301   2.647061 |  8. |  1960.4   -8.005725   4.722736   -4.496271   2.646884 |  9. |  1961.1   -8.009368   4.706207   -4.489013   2.648654 | 10. |  1961.2   -7.989948   4.675196   -4.477735   2.650421 |     +-------------------------------------------------------+

As we did before we need to transform the data in “time series” first:

    gen t = _n        label variable t "Integer time period"    tsset t
        time variable:  t, 1 to 128                delta:  1 unit

# Running a Generic Dynamic Models

In the PS2, question 1, for that specific data set (which is different than the one used here) you are asked to run a simple dynamic model in the following autorregressive distributed lag form:

$gas = a_{0} + a_{1} gas_{t-1} + a_{2} \Delta gas_{t-1} + a_{3} price + a_{4} \Delta price + a_{5} \Delta price_{t-1} + a_{6} income + a_{7} \Delta income + a_{8} \Delta income_{t-1} + \epsilon$

In STATA, you can run this model as follows:

   regress gas L.gas LD.gas price D.price LD.price income D.income LD.income

Source |       SS       df       MS              Number of obs =     126-------------+------------------------------           F(  8,   117) =  863.16       Model |  1.67892182     8  .209865228           Prob > F      =  0.0000    Residual |  .028446793   117  .000243135           R-squared     =  0.9833-------------+------------------------------           Adj R-squared =  0.9822       Total |  1.70736862   125  .013658949           Root MSE      =  .01559------------------------------------------------------------------------------         gas |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------         gas |         L1. |   .9721906   .0254024    38.27   0.000     .9218825    1.022499         LD. |  -.1788088   .0907299    -1.97   0.051    -.3584947    .0008771             |       price |         --. |  -.0183001   .0096211    -1.90   0.060    -.0373542     .000754         D1. |  -.2359339   .0373382    -6.32   0.000    -.3098801   -.1619876         LD. |   .0584094   .0445919     1.31   0.193    -.0299024    .1467213             |      income |         --. |   .0082806   .0205162     0.40   0.687    -.0323507    .0489119         D1. |   .2722332   .1549735     1.76   0.082    -.0346836    .5791501         LD. |   .0446936   .1552938     0.29   0.774    -.2628576    .3522449             |       _cons |  -.0929674     .12155    -0.76   0.446    -.3336909    .1477561------------------------------------------------------------------------------

The model above is your benchmark. You should now start your model selection process.

Even when there exist commands to calculate the Akaike or the Schwarz criterion, in Econ 508 it is recommended that you compute them by hand, as taught in class, using the formulae given in Prof. Koenker's Lecture Note 4:

$AIC=log(\hat{\sigma_j}^2)+\frac{p_i}{n}*2$ $SIC=log(\hat{\sigma_j}^2)+\frac{p_i}{n}*log(n)$

IIn STATA, you can calculate various information criteria and other important statistics using functions to extract matrices and scalars generated by the regression operation:

 Sample size: after regress, type scalar A = _result(1) or e(N) Model SS: after regress, type scalar B = _result(2) or e(mss) Model DF: after regress, type scalar C = _result(3) or e(df_m) Residual SS: after regress, type scalar D = _result(4) or e(rss) Residual df: after regress, type scalar E = _result(5) or e(df_r) F-Statistic: after regress, type scalar F = _result(6) or e(F) R-Squared: after regress, type scalar G = _result(7) or e(r2) Adj. R-Squared: after regress, type scalar H = _result(8) or e(r2_a) Root MSE: after regress, type scalar I = _result(9) or e(rmse) Coefficients: after regress, type matrix b = get(_b) or e(b) # of parameters: after getting the matrix b, type scalar K = colsof(b) Covariance matrix: after regress, type matrix v = get(VCE) or e(V)

It is not necessary to memorize these. You can always obtain the name for the saved output by typing "ret list" and/or "eret list" after the regression (or summarize).
    scalar list A B C D E F G H I K
         A =        126         B =  1.6789218         C =          8         D =  .02844679         E =        117         F =  863.16345         G =  .98333881         H =  .98219958         I =  .01559279         K =          9
    matrix list b
 b[1,9]             L.         LD.                      D.         LD.                          gas         gas       price       price       price      income    y1    .9721906  -.17880883  -.01830014  -.23593387   .05840944    .0082806   
    matrix list v

                     L.         LD.                      D.         LD.                              gas         gas       price       price       price      income      L.gas   .00064528   LD.gas  -.00044148   .00823192    price    .0000944   7.738e-06   .00009257  D.price  -.00013958   .00013756  -.00003085   .00139414 LD.price  -.00013149   .00182434  -.00005647  -.00046606   .00198843   income  -.00045971   .00041162  -.00008305   .00008961   .00015336   .00042091 D.income   .00048852  -.00068779   .00018875   .00096361   .00057653  -.00023978  LD.income   .00047985  -.00230927   .00018465  -.00049299   .00089038  -.00024082      _cons   .00263343  -.00174625  -.00005413   -.0005658  -.00012155  -.00141118  

You can get the Akaike Information Criterion as follows:
    scalar AIC=log(_result(4)/_result(1))+(colsof(b)/_result(1))*2     scalar list AIC
       AIC = -8.2531446
You can get the Schwarz Information Criterion as follows:
    scalar SIC=log(_result(4)/_result(1))+(colsof(b)/_result(1))*log(_result(1))    scalar list SIC
       SIC = -8.0505531

# How to Obtain Information Criteria

To help you on the model selection, copy and paste the following code in the do-file editor:
 ***************************START HERE*********************************************** A small do-file to calculate AIC and SIC in STATA* use "AUTO2.dta", clear* gen t=_n* label variable t "Integer time period"* tsset t** Model 1.1: Full Modelregress  gas  L.gas  LD.gas  price  D.price  LD.price  income  D.income  LD.incomematrix   b1=e(b)scalar   AIC1=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC1=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.2: Drop LD.incomeregress  gas  L.gas  LD.gas  price  D.price  LD.price  income  D.incomematrix   b2=e(b)scalar   AIC2=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC2=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.3: Drop LD.priceregress  gas  L.gas  LD.gas  price  D.price income  D.income  LD.incomematrix   b3=e(b)scalar   AIC3=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC3=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.4: Drop LD.gasregress  gas  L.gas price  D.price  LD.price  income  D.income  LD.incomematrix   b4=e(b)scalar   AIC4=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC4=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.5: Drop LD.price, LD.incomeregress  gas  L.gas  LD.gas  price  D.price income  D.income matrix   b5=e(b)scalar   AIC5=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC5=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.6: Drop LD.gas, LD.incomeregress  gas  L.gas price  D.price  LD.price  income  D.income matrix   b6=e(b)scalar   AIC6=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC6=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.7: Drop LD.gas, LD.priceregress  gas  L.gas price  D.price income  D.income  LD.incomematrix   b7=e(b)scalar   AIC7=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC7=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.8: Drop LD.gas, LD.price, LD.incomeregress  gas  L.gas price  D.price income  D.incomematrix   b8=e(b)scalar   AIC8=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC8=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.9: Drop LD.gas, LD.price, D.income, LD.incomeregress  gas  L.gas price  D.price income matrix   b9=e(b)scalar   AIC9=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC9=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.10: Drop LD.gas, D.price, LD.price, LD.income regress  gas  L.gas price income  D.incomematrix   b10=e(b)scalar   AIC10=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC10=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.11: Drop LD.gas, D.price, LD.price, D.income, LD.income regress  gas L.gas price income matrix   b11=e(b)scalar   AIC11=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC11=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** Model 1.12: Drop all lags and differencesregress  gas price income matrix   b12=e(b)scalar   AIC12=log(e(rss)/e(N))+(colsof(b1)/e(N))*2scalar   SIC12=log(e(rss)/e(N))+(colsof(b1)/e(N))*log(e(N))** List all calculated AICs and SICsscalar listclear*****************************END HERE**********************************************
     SIC12 =  -5.408565     AIC12 = -5.6090984     SIC11 = -7.6982209     AIC11 = -7.8997774     SIC10 = -7.7433843     AIC10 = -7.9449408      SIC9 = -7.9607721      AIC9 = -8.1623287      SIC8 = -7.9758284      AIC8 =  -8.177385      SIC7 = -7.9689245      AIC7 = -8.1715161      SIC6 = -8.0178852      AIC6 = -8.2204768      SIC5 = -8.0358729      AIC5 = -8.2384645      SIC4 = -8.0178958      AIC4 = -8.2204873      SIC3 =  -8.035995      AIC3 = -8.2385865      SIC2 = -8.0498454      AIC2 = -8.2524369      SIC1 = -8.0505531      AIC1 = -8.2531446