Applied econometrics: Econ 508
logo

Applied Econometrics 
Econ 508 - Fall 2014

Professor: Roger Koenker 

TA: Nicolas Bottan 

Welcome to a new issue of e-Tutorial. This e-TA will focus on on Duration Models (a.k.a. Survival Analysis) in the context of the PS5. 1

Data

You can download the data set, called weco14.txt from the Econ 508 web site. Save it in your preferred directory.

Then you can load it in R. If you saved it in your hard drive in a folder called econ508 you can load it by typing:

weco<-read.table("C:/econ508/weco14.txt", header=T, sep="")

or you can retrieve it from the web

weco<-read.table("http://www.econ.uiuc.edu/~econ508/data/weco14.txt", header=T, sep="")
head(weco)
         y sex dex lex  kwit job_tenure status
1 13.72579   0  38  10 FALSE        277   TRUE
2 17.15373   1  55  11  TRUE        173   TRUE
3 13.62992   1  45  12 FALSE        410   TRUE
4 13.03998   1  41  11 FALSE        247   TRUE
5 13.19510   1  42  10 FALSE        340   TRUE
6 16.43010   0  38  12 FALSE        517   TRUE

Next generate the variables needed:

weco$lex2<-weco$lex^2

Survival Analysis

Kaplan-Meier

To do this kind of analysis in R we are going to use the package survival

require(survival)
Loading required package: survival
Loading required package: splines

Then we need to identify the “analysis time” variable, and the “failure” variable. The former indicates the duration of the process, while the latter indicates whether the data is censored. In the PS5 data set, “tenure” represents the “analysis-time” variable, i.e., the duration of the process, while “status” represents the “failure” variable, assuming values of 0 if it is censored, and 1 if it is failure.

Initially we need to generate the Kaplan-Meier estimator for men and women:

fit <- survfit(Surv(job_tenure,status)~sex, data=weco)
plot(fit, lty=c(1,3), xlab="Time", ylab="Survival Probability")
legend(2000,0.7, legend=c("Female", "Male"), lty=c(1,3) )

Next you may want to test formally for differences between this groups. To do so you can use the survdiff function:

survdiff(Surv(job_tenure,status)~sex, data=weco)
Call:
survdiff(formula = Surv(job_tenure, status) ~ sex, data = weco)

        N Observed Expected (O-E)^2/E (O-E)^2/V
sex=0 296      240      279      5.38      10.6
sex=1 387      332      293      5.12      10.6

 Chisq= 10.6  on 1 degrees of freedom, p= 0.00111 

Cox proportional hazard model

Next the PS asks for the estimation of a Cox proportional hazard model. You can estimate such model as follows:

coxph<-coxph(Surv(job_tenure,status)~sex+dex+lex+lex2, data=weco)
coxph
Call:
coxph(formula = Surv(job_tenure, status) ~ sex + dex + lex + 
    lex2, data = weco)

        coef exp(coef) se(coef)      z       p
sex   0.5444     1.724  0.08819   6.17 6.7e-10
dex  -0.0920     0.912  0.00668 -13.77 0.0e+00
lex  -1.1586     0.314  0.32682  -3.54 3.9e-04
lex2  0.0464     1.047  0.01317   3.52 4.2e-04

Likelihood ratio test=217  on 4 df, p=0  n= 683, number of events= 572 

And plot the results by calling:

plot(survfit(coxph))


  1. Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu