Applied econometrics: Econ 508
logo

Applied Econometrics 
Econ 508 - Fall 2014

Professor: Roger Koenker 

TA: Nicolas Bottan 

Welcome to a new issue of e-Tutorial. This time we focus on measures of inequality. We will suggest some basic methods to calculate the Hill estimator, the Lorenz curve, and the Gini coefficient. The data set to be used is the same from the problem set 4. 1

Data

The first thing you need is to download the phuzics panel data set, called phuzics10.txt from the Econ 508 web site. Save it in your preferred directory.

The next step is loading the Data in R. If you saved it in your hard drive you can load it by typing:

  phuzics<-read.table("C:/econ508/phuzics10.txt", header=T, sep="")

or you can retrieve it from the web

  phuzics<-read.table("http://www.econ.uiuc.edu/~econ508/data/phuzics10.txt", header=T, sep="")

Hill Estimator

Here we compute the Hill estimator mentioned in Prof. Koenker’s Lecture 19 and “Appendix A - Concentration of Productivity in Phuzics Scholarship”.The idea is to calculate the index of concentration “\(\alpha\)” for the years between 1970 and 1990, and check if there is a trend. As mentioned in the note, if there is an unbiased positive trend, you can infer that phuzics productivity is becoming less concentrated, and the field is becoming less scientific (according to Parzen’s definition of the term). An unbiased negative trend would mean the reverse. The function below can calculate the alpha-coefficient of concentration for a given year (say, 1970). You are expected to adjust the code such that you can reproduce the experiment for the other periods:

  phuzics<-phuzics[phuzics$yr==70,]  
  phuzics<-phuzics[order(-phuzics$y),]
  phuzics$yratio<-phuzics$y/phuzics$y[11]  
  phuzics<-phuzics[phuzics$y>=phuzics$y[10],]  
  phuzics$a<-log(phuzics$yratio)
  alpha<-(mean(phuzics$a))^(-1) 
  alpha
[1] 0.7951886

You are asked to repeat this for the remaining years, to make things easier you can wrap up this in a function and use sapply to get the result

  hill<-function(year){
    phuzics<-phuzics[phuzics$yr==year,]  
    phuzics<-phuzics[order(-phuzics$y),]
    phuzics$yratio<-phuzics$y/phuzics$y[11]  
    phuzics<-phuzics[phuzics$y>=phuzics$y[10],]  
    phuzics$a<-log(phuzics$yratio)
    alpha<-(mean(phuzics$a))^(-1) 
    alpha
  }

  year<-as.vector(70:102)
  hill_results<-sapply(year,hill)
  fit<-lm(hill_results~year)

And you can plot and interpret whether the field has become more or less concentrated

  fit<-lm(hill_results~year)
  plot(year,hill_results)
  abline(fit)

Bootstrap Bias-Correction for the Hill Estimator:

You can also implement the bootstrap bias correction for the Hill estimator, by using the function sample. We will do that for one year, say, 1970, and then you can replicate the experiment for other years. As a suggestion you can do as before; wrap everything in a function and take advantage of sapply. Recall from the Notes that the formula is:

\[ \tilde{\alpha}_{t}=\hat{\alpha}_{t}\frac{\hat{\alpha}_{N}}{\hat{\alpha}_{n_{t}}} \]

The strategy is as follows:

Numerator \(\alpha_{N}\):

The first step is to calculate the numerator of the formula on page 2 of the note “Appendix A - Concentration of Productivity in Phuzics Scholarship”, called alpha_N. This is done as follows.

  1. Open the original data set;
  phuzics<-read.table("http://www.econ.uiuc.edu/~econ508/data/phuzics10.txt", header=T, sep="")
  1. Drop all observations not included in your interval of interest, say not included in 1970-1990;
  phuzics<-phuzics[phuzics$yr>=70&phuzics$yr<=90,]  
  1. Compute the Hill estimator for this pooled sample.
  phuzics<-phuzics[order(-phuzics$y),]
  phuzics$yratio<-phuzics$y/phuzics$y[11]  
  phuzics<-phuzics[phuzics$y>=phuzics$y[10],]  
  phuzics$a<-log(phuzics$yratio)
  alpha_N<-(mean(phuzics$a))^(-1) 
  alpha_N
[1] 6.30508

Denominator \({\hat{\alpha}_{n_{t}}}\):

  1. Open the original data set;
  phuzics<-read.table("http://www.econ.uiuc.edu/~econ508/data/phuzics10.txt", header=T, sep="")
  1. Drop all observations not included in your interval of interest:
  phuzics<-phuzics[phuzics$yr==70,]  
  1. Get the sample size of the respective year of interest;
  n<-sum(phuzics$yr==70)
  1. Generate a bootstrapped sample (with replacement) of the same size of the year you are working with. For example, because 1970 has 11 observations, you type
  sample<-sample(phuzics,n,replace=TRUE)

this will generate a bootstrapped sample with 11 observations drawn from the pooled sample.

  1. Calculate the Hill estimator for this bootstrapped sample.

This routine will give you one bootstrapped alpha for the year 1970. You need to repeat the experiment “B” times (say, 20 times), and get “B” (say, 20) different bootstrapped alphas for 1970. After that, take the average of those “B” (say, 20) alphas and use this number as the denominator of the formula.

One way to achieve this result is to wrap everything in a function and use a loop:

  hill.den<-function(data.var,n){  
    s<-sample(data.var,n,replace=TRUE)
    s<-sort(s, decreasing = T)
    yratio<-s/s[11]  
    s<-s[s>=s[10]]  
    a<-log(yratio)
    (mean(a))^(-1) 
  }
  
  set.seed(12123)
  
  a<-rep(0,20)
  for(i in 1:20){
    a[i]<-hill.den(phuzics$y,n)
  }
  
  alpha_denom<-mean(a)

Finally, you need to apply the formula: multiply the original Hill estimator of 1970 by the pooled Hill estimator and divide it by the average bootstrapped estimator for 1970, so that you find a bias-corrected Hill estimator for the year 1970.

This procedure is required for every year in the period 1970-1990. Each year will have its respective corrected alpha. The final step is to plot those corrected alphas along time, and check if there is any trend.

Lorenz Curves and Gini Coefficient

R has a great package to obtain Lorenz Curves and inequality measures, this package is called ineq. To get a Lorenz Curve you proceed as follows

  require(ineq)
## Loading required package: ineq
  phuzics<-read.table("http://www.econ.uiuc.edu/~econ508/data/phuzics10.txt", header=T, sep="")
  Lc.y<-Lc(phuzics$y) #computes the empirical Lorenz Curve
  plot(Lc.y, col=2)

To obtain the Gini Coefficient you type:

  ineq(phuzics$y,type="Gini")
## [1] 0.3925611

In the problem set you are asked to compute the Gini for each year, you can proceed as above to do so. Wrap everything in a function and use sapply

  gini<-function(year){
    phuzics<-phuzics[phuzics$yr==year,]  
    ineq(phuzics$y,type="Gini")
  }

  year<-as.vector(70:102)
  gini_results<-sapply(year,gini)
  plot(year,gini_results, type="l")

To obtain confidence intervals for Hill and Gini coefficients you can use bootstrap as we saw in previous e-TA.


  1. Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu