Econ 508 Econometrics Group Home | Faculty | Students | Alumni | Courses | Research | Reproducibility | Lab | Seminars | Economics | Statistics | Fame
Applied Econometrics
Econ 508 - Fall 2007

e-Tutorial 15: Quantile Regression

Welcome to the fifteenth issue of e-Tutorial. Here you will see basic applications of Koenker and Bassett (1978) Quantile Regression methodology. The target is the PS5.

You can download your data from the Econ 508 web page (here) and save the file in your
preferred directory (I'll save mine as "C:\weco.dat"). Then you open STATA and type:

infile  y sex dex lex kwit tenure censored using "C:\weco.dat"

Drop the first line of the data set containing missing values due to the labels in the .txt file.

Then save the file in STATA format (I'll save mine as "C:\weco.dta").

Question 1:

On part (a) you are going to run a simple linear regression model:

gen lex2=lex^2
regress  y sex dex lex lex2

And then you will test the hypothesis that lex and lex2 are jointly significant:
test lex lex2

Note: The test above is based on a quadratic approximation to the likelihood function. The test statistic is the traditional F-statistic. If you wish you can test this hypothesis via likelihood-ratio (LR) test based on the restricted and unrestricted models. The test statistic is now chi-squared:

chi2(d0-d1)=-2(L1-L0)

where L1 and L0 are the log-likelihood functions, and d1 and d0 the model degrees of freedom, of the constrained (1) and unconstrained (0) regressions respectively. To obtain the test in STATA, proceed as follows:

regress  y sex dex lex lex2
lrtest, saving(0)
regress  y sex dex
lrtest

Finally you need to test the single hypothesis that lex2 is not significant:

regress  y sex dex lex lex2
test lex2

Don't forget to interpret the economic meaning of the results.

On part (b) of this question you are asked to plot some graphs using the regression equation of part (a), applying the mean value for dexterity, and 0 or 1 values for gender (each gender has its own curve; you can plot both curves in the same graph).

Finally you need to test for different shapes of education. To do that, you can create, for example, the following variables:

gen sexlex=sex*lex
gen sexlex2=sex*lex2

Then regress the models including such variables, and testing their significance:

regress  y sex dex lex lex2 sexlex
test sexlex
regress  y sex dex lex lex2 sexlex2
test sexlex2
regress  y sex dex lex lex2 sexlex sexlex2
test sexlex sexlex2

On part (c) you need to construct a confidence interval for the optimal level of education (lex*). You can do that based on the previous tutorials and class notes.

Question 2:

For Quantile Regression in R, see Appendix A below.  For Quantile Regression in STATA, start here:

Part (a):  I suggest the following strategy:

- Run quantile regressions of the question 1 model at least for the 5th, 25th, 50th, 75th, and 95th quantiles:

qreg y sex dex lex lex2, quant(.05)
qreg y sex dex lex lex2, quant(.25)
qreg y sex dex lex lex2, quant(.50)
qreg y sex dex lex lex2, quant(.75)
qreg y sex dex lex lex2, quant(.95)

Feel free to include as many quantiles as you wish. For example, the output for median regression will be:

Median regression                                    Number of obs =       683
Raw sum of deviations 790.2578 (about 14.62685)
Min sum of deviations 609.7646                     Pseudo R2     =    0.2284

------------------------------------------------------------------------------
y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
sex |  -.8966177    .105081     -8.533   0.000      -1.102941   -.6902943
dex |   .1119312   .0071806     15.588   0.000       .0978324    .1260301
lex |   .9280681   .3468423      2.676   0.008        .247054    1.609082
lex2 |  -.0404344   .0135943     -2.974   0.003      -.0671263   -.0137424
_cons |   4.923647   2.241603      2.196   0.028       .5223291    9.324965
------------------------------------------------------------------------------

- Next it would be nice if you could construct a table like this (again, feel free to include as many quantiles as you wish):

Table 1: Quantile regression estimates for different quantiles
 Quantile constant sex dex lex lex2 5th coefficient (std. error) ... 25th ... 50th 75th 95th

- After you have run the regression for different quantiles, you can plot the respective curves for each of these regressions.

Bootstrapped standard errors in STATA:

It is recommended the use of bootstrapped standard errors. When you use the bootstrap command, however, you have problems to reproduce the results. To assure reproducibility, fix the seed of the pseudo-random number generator of the bootstrap process as follows:

set seed 2

Here the number 2 can be replaced by any other initial value. The reproducibility is assured as long as you use your selected seed whenever you run the quantile regression again. For that matter, you can use the command bsqreg, with say 500 replications:

bsqreg y sex dex lex lex2, quant(.50) reps(500)

(estimating base model)
(bootstrapping ................................................................
> .............................................................................
> .............................................................................
> .............................................................................
> .............................................................................
> .............................................................................
> ...................................................)

Median regression, bootstrap(500) SEs                Number of obs =       683
Raw sum of deviations 790.2578 (about 14.62685)
Min sum of deviations 609.7646                     Pseudo R2     =    0.2284

------------------------------------------------------------------------------
y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
sex |  -.8966177   .1493668     -6.003   0.000      -1.189895   -.6033406
dex |   .1119312   .0064046     17.477   0.000       .0993559    .1245065
lex |   .9280681   .3649069      2.543   0.011       .2115847    1.644551
lex2 |  -.0404344   .0135158     -2.992   0.003      -.0669723   -.0138964
_cons |   4.923647   2.450279      2.009   0.045       .1125995    9.734694
------------------------------------------------------------------------------

For part (b) you need to estimate the mean productivity for each of the quantile regressions you have estimated above. But you should do that according to the years of education. So, first you find the range of years of education:

summarize lex

Then you construct a table with the estimated productivity according to each year in the range of lex and quantiles:

Table 2: Estimated productivity according to quantiles and years of education.
 lex / quantile 8 9 10 11 12 13 14 15 16 17 18 19 5th estimated productivity ... 25th ... 50th 75th 95th

Again, feel free to include more quantiles in your analysis. Please provide an economic discussion of your findings. I suspect that screening the above results between different genders (e.g., a table for males and another for females) would shed light on the hiring decisions.

In part (c) you need to interpret the question and try to solve the problem by yourself. The tables above will be very helpful. You can also compare densities according to ranges of years of education, or use any other strategy you think is reasonable. Here are some examples:

*Scatter plot of productivity according to years of education:

graph y lex

*Kernel density of productivity for individuals with less than 12 years of education (compared with Normal distribution):

kdensity y if lex<12, normal title(Productivity Density for Lower Education) xlab ylab

*Compare with the kernel density of productivity for individuals with more than 15 years of education:

kdensity y if lex>15, normal title(Productivity Density for Higher Education) xlab ylab

*Finally you can compare the productivity of men and women in a single graph, regardless the years of education:

kdensity y, nogr gen(x fx)
kdensity y if sex==0, nogr gen(fx0) at(x)
kdensity y if sex==1, nogr gen(fx1) at(x)
label var fx0 "Women"
label var fx1 "Men"
graph fx0 fx1 x,  xlab ylab c(ll) title(Productivity Densities for Men and Women)

Appendix A: Quantile Regression in R

Obviously, you can also perform the Quantile Regression approach in R. There are many advantages in doing that in R. For example, you can generate tables with the coefficients of all requested quantile regressions in a single command. Besides that, you can also plot each regression coefficient (and respective confidence interval) for all quantile regressions in the sample. Moreover, the bootstrapped standard errors can be obtained much faster than in STATA.

You can start a simple R session for PS 5 as follows:

1) Infile the data:

2) Extract the variables from the data set:

y<-weco\$y
sex<-weco\$sex
dex<-weco\$dex
lex<-weco\$lex
lex2<-lex^2

Note: If you haven't done so yet, you need to install the package for Quantile Regression, developed by Prof. Koenker, and available at the CRAN web site under the name "quantreg". With your computer connected to the web, you can do that by typing the following commands in the R console:

help(install.packages)
install.packages("quantreg")

3) Before the use of the Quantile Regression toolkit, you need to call the library with the package quantreg:

library(quantreg)

4) Run your desired quantile regression. For example, for the median:

rq(y~sex+dex+lex+lex2, tau=.5)

Call:
rq(formula = y ~ sex + dex + lex + lex2, tau = 0.5)

Coefficients:
coefficients    lower bd    upper bd
(Intercept)   4.92366053  0.73781565  9.04266184
sex          -0.89661622 -1.17194642 -0.75290670
dex           0.11193125  0.10427513  0.12260806
lex           0.92806584  0.37136396  1.52973862
lex2         -0.04043428 -0.06285557 -0.02524041

Degrees of freedom: 683 total; 678 residual
Warning message:
Solution may be nonunique in: rq.fit.br(x, y, tau = tau, ...)

5) To create a table with the main quantiles, you can write:

TAB<-table.rq(y~sex+dex+lex+lex2, method="br")
TAB

\$a
, , tau= 0.25

coefs lower ci limit upper ci limit
(Intercept)  5.55654002    -0.03709115     7.77709369
sex         -0.75837469    -0.90836371    -0.59763598
dex          0.10646500     0.09662178     0.11577277
lex          0.69023452     0.39418993     1.50958013
lex2        -0.02918982    -0.03901967    -0.01760687

, , tau= 0.5

coefs lower ci limit upper ci limit
(Intercept)  4.92366053     0.73781565     9.04266184
sex         -0.89661622    -1.17194642    -0.75290670
dex          0.11193125     0.10427513     0.12260806
lex          0.92806584     0.37136396     1.52973862
lex2        -0.04043428    -0.06285557    -0.02524041

, , tau= 0.75

coefs lower ci limit upper ci limit
(Intercept)  5.99592159     3.77638255     10.4558375
sex         -1.04198000    -1.20939712     -0.8321452
dex          0.11531778     0.10555050      0.1356534
lex          0.94287012     0.22175980      1.2691270
lex2        -0.04431575    -0.05597181     -0.0144157

\$taus
[1] 0.25 0.50 0.75

\$method
[1] "br"

attr(,"class")
[1] "table.rq"

6) To obtain graphs with the coefficients and standard deviations of the main regression quantiles, you can write:

plot(TAB)

7) To obtain the standard errors, t-statistics, and p-values for a given quantile regression (e.g., the 10th quantile), you can write:

fit1<-rq(y~sex+dex+lex+lex2, tau=.10)
summary(fit1)

Call: rq(formula = y ~ sex + dex + lex + lex2, tau = 0.1)

Coefficients:
Value    Std. Error t value  Pr(>|t|)
(Intercept)  5.62075  3.07783    1.82620  0.06826
sex         -0.64677  0.20454   -3.16208  0.00164
dex          0.09684  0.01163    8.32661  0.00000
lex          0.55009  0.40258    1.36642  0.17226
lex2        -0.02078  0.01309   -1.58729  0.11291

References:
Koenker, R. and G. Bassett, 1978, "Regression quantiles," Econometrica, 46, 33-50.

 Last update: November 13, 2007