Econ 508 Econometrics Group

Applied Econometrics
Econ 508 - Fall 2007

e-Tutorial 6: Delta-Method and Bootstrap Techniques

Welcome to the sixth issue of e-Tutorial, the on-line help to Econ 508. This issue provides an introduction on how to do the pratical works about the Delta-method and bootstrap in Stata and R. Hope this will be helpful for your further understanding of Prof. Koenker's Lecture 5 as well as the Question 4 of PS 2.

I. Subtract Information from Tables in Stata and R.

Running a Dynamic Model with Quadratic and Multiplicative Terms
In the problem set 2, question 4, you are asked to run a linear regression model with non-linear transformation of variables. Suppose for a moment that you have in your hands a data set like the one used here (auto2.dta), and would like to estimate an equation similar to the problem set:

gast = b0 + b1 incomet + b2 pricet + b3 (pricet)2 + b4 (pricet*incomet) + ut

(A) Stata

In Stata, you need first to generate the quadranic and other terms as follows:

gen price2= price^2
gen priceinc= price*income

then run regressing:

regress gas income price price2 priceinc

Now, suppose you are asked to calculate the price elasticity of demand at different points of the sample. To do that you will need to:

i) Obtain the coefficients of regression:

matrix B=get(_b)
matrix list B

B[1,5]
..........

ii) Extract the coefficients from the matrix B:

scalar b1=_coef[income]
scalar b2=_coef[price]
scalar b3=_coef[price2]
scalar b4=_coef[priceinc]
scalar b0=_coef[_cons]
scalar list b1 b2 b3 b4 b0

Or another way to do it: scalar b1=B[1,1]

iii) Calculate the elasticity at different points of the sample:

gen elastpt=b2+2*b3*price+b4*income
summarize  elastpt, detail

iv) Ok. You've just created your elasticity series. Now you can study it at different points of the sample, plot it against year  (check for inter-temporal structural breaks), against price, and against income.

graph elastpt  quarter, ylabel xlabel(1960.1 (5) 1990.1) t1title(Price Elasticity vs. Time)

graph elastpt  gas, ylabel xlabel t1title(Price Elasticity vs. Gas Expenditure)

graph elastpt price, ylabel xlabel t1title(Price Elasticity vs. Gas Price)

graph elastpt income, ylabel xlabel t1title(Price Elasticity vs. Per Capita Income)

Remember that those are point estimates. To compute confidence intervals, you will need special techniques, as I'll show below (Delta-method and Bootstrap).

Continuing with the problem set 2, question 4, you should:

a) Interpret the implications of the model.

b) Calculate the price such that [b2+2*b3*price+b4*income] =  -1
Given the formula of elasticity, and assuming income=x0, just find the optimal price. Call it pstar.

c) Examine the partial residual plots (see Appendix A)

d) Finally, in the last part of the question 4 you are asked to estimate model 2 and to compute the revenue maximizing price level assuming income=log(15)=2.708050 approximately. This is a straightforward computation. The interesting part come with the application of the delta method and the bootstrap to achieve reasonable confidence intervals for the optimal price.

Standard Errors and Confidence Intervals for Non-linear Estimates:

Delta-Method:You are expected to sketch the Delta-method and calculate the derivatives by hand along with the computational routine below.

1) Start running the full model again:
regress gas income price price2 priceinc

Source |       SS       df       MS                  Number of obs =     128
---------+------------------------------               F(  4,   123) =  117.59
Model |  1.45421455     4  .363553638               Prob > F      =  0.0000
Residual |   .38027632   123  .003091677               R-squared     =  0.7927
Total |  1.83449087   127   .01444481               Root MSE      =   .0556

------------------------------------------------------------------------------
gas |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
income |  -6.109449   2.750048     -2.222   0.028        -11.553   -.6658979
price |   3.090076   2.894747      1.067   0.288      -2.639898     8.82005
price2 |   .2839261   .1693137      1.677   0.096        -.05122    .6190723
priceinc |   1.454257   .5879186      2.474   0.015        .290508    2.618006
_cons |  -25.52231    11.8968     -2.145   0.034       -49.0713   -1.973313
------------------------------------------------------------------------------

2) Recall that you have stored those coefficients under the names [b1, b2 , b3, b4, b0]. Now you need to get the covariance matrix V of them:

matrix V=get(VCE)
matrix list V

Please note the order of the covariance matrix: the first column corresponds to the coefficient of income, the second to the coefficient of price, the third to the coefficient of price2, the fourth to the coefficient of priceinc, and the last to the coefficient of the intercept _cons. See the results below:

symmetric V[5,5]
income       price      price2    priceinc       _cons
income   7.5627648
price  -6.5629988   8.3795596
price2  -.00833388  -.27004512   .02866714
priceinc  -1.6166822    1.405926   .00149168   .34564825
_cons    30.88608  -33.229023   .62976256  -6.6089597   141.53396

3) Next you need to input pstar:
scalar pstar = -(1+b2+b4*ln(15))/(2*b3)
scalar list pstar

4) And then you need to obtain the gradient vector with the partial derivatives of pstar (optimal price) with respect to each regressor, obeying the order of the covariance matrix:

G'= (dpstar/d_coef[income],
dpstar/d_coef[price],
dpstar/d_coef[price2],
dpstar/d_coef[priceinc],
dpstar/d_coef[ _cons])

which is the same as  G'= (dpstar/db1 , dpstar/db2, dpstar/db3, dpstar/db4, dpstar/db0)

In STATA the vector will be as follows:

matrix G=( 0  \  -1/(2*b3)  \  (1+b2+b4*ln(15))/(2*b3*b3)  \  -ln(15)/(2*b3)  \  0)
matrix list G

This corresponds to the gradient vector. The backslashes separate each row of the vector. The results are as follows:

G[5,1]
c1
r1           0
r2  -1.7610213
r3   49.794522
r4  -4.7689342
r5           0

5) The next step is to obtain the estimated variance of the optimal price, G'VG:

matrix GVG=G'*V*G
matrix list GVG

symmetric GVG[1,1]
c1
c1  175.19377

6) Then you can obtain the standard error by taking the square root of this scalar value:
scalar sepstar=sqrt(el(GVG,1,1))
scalar list sepstar

sepstar =  13.236079

That's it. The variable sepstar  is the standard error of pstar.

7) The last step is to construct your confidence interval, and put the variables in levels, as follows:

scalar CIupper=pstar + 1.96*sepstar
scalar CIlower=pstar - 1.96*sepstar
scalar list CIlower pstar CIupper
CIlower =  -40.08068
pstar = -14.137967
CIupper =  11.804747

This will give you the confidence interval of the optimal price level. Observe that the results you obtained are for prices in logs. You can try to get the respective results in levels as well, but it is worth to think about whether you can do that directly or you need some additional step because of the non-linearity of the point estimate.

Bootstrap: You are expected to explain the various Bootstrap techniques in words (as if you are explaining for a non-econometrician), along with the complete understanding of the results provided by STATA.

In STATA, you can calculate parametric (Normal), percentiled, and bias corrected bootstrapped confidence intervals for the optimal price as follows:

set seed 1
bs "reg gas income price price2 priceinc" "(-1)*(1+_coef[price]+_coef[priceinc]*ln(15))/(2*_coef[price2])", reps(10000)

command:     reg gas income price price2 priceinc
statistic:   (-1)*(1+_coef[price]+_coef[priceinc]*ln(15))/(2*_coef[price2])
(obs=128)

Bootstrap statistics

Variable |   Reps   Observed       Bias   Std. Err.   [95% Conf. Interval]
---------+-------------------------------------------------------------------
bs1 |  10000  -14.13797   10.83535   3593.234   -7057.599  7029.323  (N)
|                                           -166.4773  137.5923  (P)
|                                           -292.9886  46.90307 (BC)
-----------------------------------------------------------------------------
N = normal, P = percentile, BC = bias-corrected

In the command above, you use set seed 1 to assure reproducibility of your results. It means that, instead of generating a different initial random number every time you run the bootstrap, STATA will use the same seed, i.e., same initial random number (equals 1 in this case), for the iteration process.  The first expression in quotation marks is the estimator, while the second quoted expression is the estimate you are interested.  reps (10000) means that STATA will repeat the process 10000 times, to give  you the confidence interval.

Recall the generated statistic is in log form. A good question is whether you should present your Delta-method and Bootstrap confidence interval in logs or in levels... Think about it.

Appendix A: Partial Residual Plot Revisited

It  is importante to mention that all results presented here are based on a different data set (auto2.dta) than the data set used on the problem set 2 (gasnew.dat):

I) Use the partial residual plot to check on the effect of the quadratic term. To obtain the partial residual plot with respect to the quadratic term you should :

I.1) Estimate model (2) without the regressor price2, and call this model (2.1)
regress gas income price priceinc

Source |       SS       df       MS                  Number of obs =     128
---------+------------------------------               F(  3,   124) =  153.61
Model |  1.44552053     3  .481840178               Prob > F      =  0.0000
Residual |  .388970339   124  .003136858               R-squared     =  0.7880
Total |  1.83449087   127   .01444481               Root MSE      =  .05601

------------------------------------------------------------------------------
gas |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
income |  -6.026909   2.769625     -2.176   0.031      -11.50877   -.5450443
price |   5.764667   2.433312      2.369   0.019       .9484606    10.58087
priceinc |   1.439483   .5921323      2.431   0.016       .2674872    2.611478
_cons |  -31.75963   11.38268     -2.790   0.006      -54.28914   -9.230113
------------------------------------------------------------------------------

I.2) Obtain the residuals of the model (2.1):
predict gasres, resid

I.3) Estimate model (2) using price2 instead of gas as the dependent variable; call it model (2.2):
regress price2 income price priceinc

Source |       SS       df       MS                  Number of obs =     128
---------+------------------------------               F(  3,   124) =       .
Model |  324.908293     3  108.302764               Prob > F      =  0.0000
Residual |  .107847434   124  .000869737               R-squared     =  0.9997
Total |  325.016141   127  2.55918221               Root MSE      =  .02949

------------------------------------------------------------------------------
price2 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
income |   .2907119    1.45837      0.199   0.842      -2.595811    3.177234
price |   9.420023   1.281281      7.352   0.000       6.884009    11.95604
priceinc |  -.0520346   .3117923     -0.167   0.868      -.6691589    .5650897
_cons |   -21.9681   5.993648     -3.665   0.000      -33.83121   -10.10499
------------------------------------------------------------------------------

I.4) Obtain the residuals of the model (2.2):
predict pric2res, resid

I.5) Run the Gauss-Frisch-Waugh "regression", and check if the slope coefficient is the same as in the original model (2):
regress gasres pric2res

Source |       SS       df       MS                  Number of obs =     128
---------+------------------------------               F(  1,   126) =    2.88
Model |  .008694019     1  .008694019               Prob > F      =  0.0921
Residual |  .380276321   126  .003018066               R-squared     =  0.0224
Total |   .38897034   127  .003062759               Root MSE      =  .05494

------------------------------------------------------------------------------
gasres |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
pric2res |   .2839261   .1672859      1.697   0.092      -.0471278    .6149801
_cons |   2.57e-10   .0048558      0.000   1.000      -.0096095    .0096095
------------------------------------------------------------------------------

I.6) Plot the the partial residuals, with a fitting line of predicted values (here called gfw):
predict gfw
graph gasres gfw pric2res, title(Partial Residuals) rlab(0) c(.l) s(oi) l1((gas)) b2((price2)) sort

II) Check whether there is a linear relationship between the residuals of the model (2.1) and the residuals of model (2.2). Draw your conclusion from what you see in the graph, and try to justify your answer in the light of basic assumptions of linear regression.

USING R:

Delta-Method:

1) Infile the data, transform them in time series, and run the full model

attach(d.gas)

gas<-ts(gas,start=1959,frequency=4)

price<-ts(price,start=1959,frequency=4)

income<-ts(income,start=1959,frequency=4)

miles<-ts(miles,start=1959,frequency=4)

price2<-price^2

princ<-price*income

f<-lm(gas~income+price+price2+princ)

summary(f)

coefs<-f\$coefficients

2) Recall that you have stored those coefficients under the names [b0, b1, b2 , b3, b4] anf constant term comes first. Now you need to get the covariance matrix V of them

vc<-vcov(f)

3) Next you need to input pstar:

pstar <- (-(1+coefs[3]+coefs[5]*log(15))/(2*coefs[4]) )

pstar

4) And then you need to obtain the gradient vector with the partial derivatives of pstar (optimal price) with respect to each regressor, obeying the order of the covariance matrix:

G'= (dpstar/d_coef[ _cons]

dpstar/d_coef[income],
dpstar/d_coef[price],
dpstar/d_coef[price2],
dpstar/d_coef[priceinc], )

which is the same as  G'= (dpstar/db0,dpstar/db1 , dpstar/db2, dpstar/db3, dpstar/db4)

In R one could do this as follows:

First generate a vector containig the values

g<-c(0,0,-1/(2*coefs[4]),(1+coefs[3]+coefs[5]*log(15))/(2*coefs[4]*coefs[4]),

-log(15)/(2*coefs[4]))

g

Then generate the matrix

G<-matrix(g,ncol=1,nrow=5)

G

> G

[,1]

[1,]  0.000000

[2,]  0.000000

[3,] -1.761006

[4,] 49.792894

[5,] -4.768893

5) The next step is to obtain the estimated variance of the optimal price, G'VG:

Var<-t(G)%*%vc%*%G

Var

[,1]

[1,] 175.1848

6) Then you can obtain the standard error by taking the square root of this scalar value:

se.pstar<-sqrt(Var)

se.pstar

> se.pstar

[,1]

[1,] 13.23574

That's it. The variable se.pstar  is the standard error of pstar.

7) The last step is to construct your confidence interval, and put the variables in levels, as follows:

Ciupper<-pstar + 1.96*se.pstar
Cilower<-pstar - 1.96*se.pstar
Ciupper

pstar

Cilower

> Ciupper

[,1]

[1,] 11.80442

> pstar

price

-14.13763

> Cilower

[,1]

[1,] -40.07967

Bootstrap:

For Bootstrap we use the two codes provide in the Lecture notes 5.

rm(list=ls()

attach(d.gas)

gas<-ts(gas,start=1959,frequency=4)

price<-ts(price,start=1959,frequency=4)

income<-ts(income,start=1959,frequency=4)

miles<-ts(miles,start=1959,frequency=4)

price2<-price^2

princ<-price*income

CODE 1 FOR BOOTSTRAP:

fit.star<-lm(gas~income+price+price2+princ)

uhat<-fit.star\$resid

R<-1000 # Number of Repetions

h<-rep(0,R)

for (i in 1:R){

gash<-fit.star\$fit+sample(uhat,replace=TRUE)

b<-lm(gash~income+price+price2+princ)\$coef

h[i]<- (-(1+b[3]+b[5]*log(15))/(2*b[4]))

}

quantile(h,c(0.025,0.975))

> quantile(h,c(0.025,0.975))

2.5%     97.5%

-123.5313   72.2807

CODE 2 FOR BOOTSTRAP:

n<-length(gas)

R<-1000 # Number of Repetions

a<-rep(0,R)

for (i in 1:R){

s<-sample(1:n,replace=TRUE)

f<-lm(gas[s]~income[s]+price[s]+price2[s]+princ[s])

coefs<-f\$coefficients

a[i]<-(-(1+coefs[3]+coefs[5]*log(15))/(2*coefs[4]))

}

quantile(a,c(0.025,0.975))

> quantile(a,c(0.025,0.975))

2.5%      97.5%

-125.77590   94.38562

 Last update: September 12, 2007