Welcome to e-Tutorial, your on-line help to Econ 536. This issue provides an introduction on how to do the pratical works about the Delta-method and bootstrap in R. Hope this will be helpful for your further understanding of Prof. Koenker’s Lecture 5.¹

Data Set

The data set used in this tutorial was borrowed from Johnston and DiNardo’s Econometric Methods (1997, 4th ed), but slightly adjusted for your needs. It is called AUTO2. You can download the data by visiting the Econ 536 web site (Data). As you will see, this adapted data set contains five series.

    auto<-read.table("http://www.econ.uiuc.edu/~econ536/Data/AUTO2.txt",header=T)
    head(auto)

  quarter       gas    price    income    miles
1  1959.1 -8.015248 4.675750 -4.505240 2.647592
2  1959.2 -8.011060 4.691292 -4.492739 2.647592
3  1959.3 -8.019878 4.689134 -4.498873 2.647592
4  1959.4 -8.012581 4.722338 -4.491904 2.647592
5  1960.1 -8.016769 4.707470 -4.490103 2.647415
6  1960.2 -7.976376 4.699136 -4.489107 2.647238

As we did before we need to transform the data in “time series” first:

    library(dyn)

Loading required package: zoo


Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

    gas<-ts(auto$gas,start=1959,frequency=4)
    price<-ts(auto$price,start=1959,frequency=4)
    income<-ts(auto$income,start=1959,frequency=4)
    miles<-ts(auto$miles,start=1959,frequency=4)

Running a Dynamic Model with Quadratic and Multiplicative Terms

In the problem set 2, question 4, you are asked to run a linear regression model with non-linear transformation of variables. Suppose for a moment that you have in your hands a data set like the one used here (auto2.dta), and would like to estimate an equation similar to the problem set:

\[gas_{t} = b_{0} + b_{1} income_{t} + b_{2} price_{t} + b_{3} price_{t} ^2 + b_{4} (price_{t}*income_{t}) + u_{t}\]

first you need first to generate the quadratic and other terms as follows:

    price2<- price^2 
    price_income<- price*income

then regress:

    f<-lm(gas~income+price+price2+price_income)
    summary(f)


Call:
lm(formula = gas ~ income + price + price2 + price_income)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.10428 -0.04569 -0.01199  0.04995  0.12215 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -25.5220    11.8967  -2.145   0.0339 *
income        -6.1094     2.7500  -2.222   0.0281 *
price          3.0900     2.8947   1.067   0.2879  
price2         0.2839     0.1693   1.677   0.0961 .
price_income   1.4542     0.5879   2.474   0.0147 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0556 on 123 degrees of freedom
Multiple R-squared:  0.7927,    Adjusted R-squared:  0.786 
F-statistic: 117.6 on 4 and 123 DF,  p-value: < 2.2e-16

Now, suppose you are asked to calculate the price elasticity of demand at different points of the sample. To do that you will need to:

Obtain the coefficients of regression:

    coefs<-f$coef
    coefs

 (Intercept)       income        price       price2 price_income 
 -25.5219744   -6.1093816    3.0899926    0.2839286    1.4542421

Extract the coefficients to scalars:

    b0<-coefs[1] 
    b1<-coefs[2]
    b2<-coefs[3] 
    b3<-coefs[4]
    b4<-coefs[5]

iii) Calculate the elasticity at different points of the sample:

    elastpt<-b2+2*b3*price+b4*income 
    
    summary(elastpt)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.80660 -0.49110 -0.27540 -0.32740 -0.07784  0.05557

iv) OK. You've just created your elasticity series. Now you can study it at different points of the sample, plot it against year  (check for inter-temporal structural breaks), against gas expenditure, and against income.

    plot(elastpt)

    plot(auto$gas, elastpt,  main="Price Elasticity vs. Gas Expendituree", xlab="Log per capita real expenditure", ylab="Price Elasticity")

    pdf("plot_elastc_income.pdf",width=6.5,height=4.5)
    plot(auto$income, elastpt,  main="Price Elasticity vs. Per Capita Income", xlab="Log per capita real disposable income", ylab="Price Elasticity", type="p")
    dev.off()

png 
  2

First note that we added options in each plot, you can explore the options of this function in the help file. Note also that the code for the last plot includes the code pdf("plot_elastc_income.pdf",width=6.5,height=4.5) and dev.off(). This lines saves the plot as a pdf file in your working directory with width 6.5 inches and height 4.5 inches.

Tip for problem set 2, question 5.

For this question you should:

Interpret the implications of the model.
Calculate the price such that $[b_{2}+2b_{3}price+b_{4}*income] = -1 $ Given the formula of elasticity, and assuming $income=x_{0}$, just find the optimal price. Call it $p^{*}$.
Examine the partial residual plots
Finally, in the last part of the question 5 you are asked to estimate model 2 and to compute the revenue maximizing price level assuming $income=\$30.000$ per year. This is a straightforward computation. The interesting part come with the application of the delta method and the bootstrap to achieve reasonable confidence intervals for the optimal price.

Delta-method and Bootstrap

Note that we obtained point estimates. To compute confidence intervals, you will need the Delta-method and/or Bootstrap. In the problem set you are asked to assume that $income=\$30.000$ per year. For this e-ta, we will assume $income=log(15)=2.708050$ approximately

Delta-method

For the problem set you are expected to sketch the Delta-method and calculate the derivatives by hand along with the computational routine below.

Start running the full model again:

    f<-lm(gas~income+price+price2+price_income)

Recall that you have stored those coefficients under the names b0, b1, b2 , b3, b4. Now you need to get the covariance matrix V of them:

    vc<-vcov(f)
    vc

             (Intercept)       income       price       price2
(Intercept)  141.5308590 30.885346024 -33.2283354  0.629758149
income        30.8853460  7.562592303  -6.5628355 -0.008335078
price        -33.2283354 -6.562835508   8.3794079 -0.270044284
price2         0.6297581 -0.008335078  -0.2700443  0.028667163
price_income  -6.6088020 -1.616645086   1.4058909  0.001491940
             price_income
(Intercept)   -6.60880203
income        -1.61664509
price          1.40589093
price2         0.00149194
price_income   0.34564029

Next you need to input pstar:

  pstar <- -(1+b2+b4*log(15))/(2*b3) 
  pstar

    price 
-14.13763

And then you need to obtain the gradient vector with the partial derivatives of pstar (optimal price) with respect to each regressor, obeying the order of the covariance matrix:

\[G'= (\frac{\partial{p^{*}}}{\partial{cons}}, \frac{\partial{p^{*}}}{\partial{income}}, \frac{\partial{p^{*}}}{\partial{price}}, \frac{\partial{p^{*}}}{\partial{price2}}, \frac{\partial{p^{*}}}{\partial{price\_income}}) \]

In R the vector will be as follows:

First generate a vector containing the values

    g<-c(0,0,-1/(2*coefs[4]),(1+coefs[3]+coefs[5]*log(15))/(2*coefs[4]*coefs[4]),-log(15)/(2*coefs[4]))
    g

                       price2     price    price2 
 0.000000  0.000000 -1.761006 49.792894 -4.768893

Then generate the matrix

    G<-matrix(g,ncol=1,nrow=5)
    G

          [,1]
[1,]  0.000000
[2,]  0.000000
[3,] -1.761006
[4,] 49.792894
[5,] -4.768893

The next step is to obtain the estimated variance of the optimal price, $G'VG$:

    Var<-t(G)%*%vc%*%G
    Var

         [,1]
[1,] 175.1848

Then you can obtain the standard error by taking the square root of this scalar value:

    se.pstar<-sqrt(Var)
    se.pstar

         [,1]
[1,] 13.23574

That’s it. The variable se.pstar is the standard error of pstar.

The last step is to construct your confidence interval, and put the variables in levels, as follows:

    Ciupper<-pstar + 1.96*se.pstar
    Cilower<-pstar - 1.96*se.pstar
    pstar_ci<-c(Ciupper,pstar,Cilower)
    pstar_ci

              price           
 11.80442 -14.13763 -40.07967

This will give you the confidence interval of the optimal price level. Observe that the results you obtained are for prices in logs. You can try to get the respective results in levels as well, but it is worth to think about whether you can do that directly or you need some additional step because of the non-linearity of the point estimate.

Bootstrap

You are expected to explain the various Bootstrap techniques in words (as if you are explaining for a non-econometrician), along with the complete understanding of the results provided by STATA.

In R, we use the two codes provide in the Lecture notes 5.

Residual Bootstrap:

    set.seed(1)

    fit.star<-lm(gas~income+price+price2+price_income)

    uhat<-fit.star$resid

    R<-1000 # Number of Repetions

    h<-rep(0,R)

    for (i in 1:R){
        gash<-fit.star$fit+sample(uhat,replace=TRUE)

        b<-lm(gash~income+price+price2+price_income)$coef

        h[i]<- (-(1+b[3]+b[5]*log(15))/(2*b[4]))
    }

    quantile(h,c(0.025,0.975))

     2.5%     97.5% 
-118.5989  107.1206

(x,y) Bootstrap

    n<-length(gas)

    R<-1000 # Number of Repetions

    a<-rep(0,R)

    for (i in 1:R){
        s<-sample(1:n,replace=TRUE)
        
        f<-lm(gas[s]~income[s]+price[s]+price2[s]+price_income[s])

        coefs<-f$coefficients

        a[i]<-(-(1+coefs[3]+coefs[5]*log(15))/(2*coefs[4]))
    }

    quantile(a,c(0.025,0.975))

     2.5%     97.5% 
-171.9723  110.4158

Recall the generated statistic is in log form. A good question is whether you should present your Delta-method and Bootstrap confidence interval in logs or in levels… Think about it.

Appendix A: Partial Residual Plot Revisited

It is important to mention that all results presented here are based on a different data set (auto2.dta) than the data set used on the problem set 2 (gasnew.dat):

Use the partial residual plot to check on the effect of the quadratic term. To obtain the partial residual plot with respect to the quadratic term you should :

1.1) Estimate model (2) without the regressor price2, and call this model (2.1)

    model_2.1<-lm(gas~income+price+price_income)

1.2) Obtain the residuals of the model (2.1):

    model_2.1.res<-resid(model_2.1)

1.3) Estimate model (2) using price2 instead of gas as the dependent variable; call it model (2.2):

    model_2.2<-lm(price2~income+price+price_income)

1.4) Obtain the residuals of the model (2.2):

    model_2.2.res<-resid(model_2.2)

1.5) Run the Gauss-Frisch-Waugh “regression”, and check if the slope coefficient is the same as in the original model (2):

    h<-lm(model_2.1.res~model_2.2.res)
    summary(h)


Call:
lm(formula = model_2.1.res ~ model_2.2.res)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.10428 -0.04569 -0.01199  0.04995  0.12215 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)  
(Intercept)   2.612e-19  4.856e-03   0.000   1.0000  
model_2.2.res 2.839e-01  1.673e-01   1.697   0.0921 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05494 on 126 degrees of freedom
Multiple R-squared:  0.02235,   Adjusted R-squared:  0.01459 
F-statistic: 2.881 on 1 and 126 DF,  p-value: 0.09212

1.6) Plot the the partial residuals, with a fitting line of predicted values

    plot(model_2.2.res,model_2.1.res,main="Partial Residuals",xlab="Gas",ylab="Price sqr")
    abline(h)

Check whether there is a linear relationship between the residuals of the model (2.1) and the residuals of model (2.2). Draw your conclusion from what you see in the graph, and try to justify your answer in the light of basic assumptions of linear regression.

Please send comments to hrtdmrt2@illinois.edu or srmntbr2@illinois.edu ↩

e-TA 5: Delta Method and Bootstrap Techniques