Welcome to a new issue of e-Tutorial. This e-TA will focus on on Duration Models (a.k.a. Survival Analysis) in the context of the PS5. 1

Data

You can download the data set, called weco14.txt from the Econ 536 web site. Save it in your preferred directory.

Then you can load it in R. If you saved it in your hard drive in a folder called econ536 you can load it by typing:

weco<-read.table("C:/econ536/weco14.txt", header=T, sep="")

or you can retrieve it from the web

weco<-read.table("http://www.econ.uiuc.edu/~econ536/Data/weco14.txt", header=T, sep="")
head(weco)
      y sex dex lex  kwit job_tenure status treatment ypost
1 13.73   0  38  10 FALSE        277   TRUE      TRUE 14.35
2 17.15   1  55  11  TRUE        173   TRUE        NA    NA
3 13.63   1  45  12 FALSE        410   TRUE      TRUE 15.75
4 13.04   1  41  11 FALSE        247   TRUE     FALSE 18.33
5 13.20   1  42  10 FALSE        340   TRUE     FALSE 13.96
6 16.43   0  38  12 FALSE        517   TRUE      TRUE 18.54

Next generate the variables needed:

weco$lex2<-weco$lex^2

Survival Analysis

Kaplan-Meier

To do this kind of analysis in R we are going to use the package survival

require(survival)
Loading required package: survival

Then we need to identify the “analysis time” variable, and the “failure” variable. The former indicates the duration of the process, while the latter indicates whether the data is censored. In the PS5 data set, “tenure” represents the “analysis-time” variable, i.e., the duration of the process, while “status” represents the “failure” variable, assuming values of 0 if it is censored, and 1 if it is failure.

Initially we need to generate the Kaplan-Meier estimator for men and women:

fit <- survfit(Surv(job_tenure,status)~sex, data=weco)
plot(fit, lty=c(1,3), xlab="Time", ylab="Survival Probability")
legend(2000,0.7, legend=c("Female", "Male"), lty=c(1,3) )

Next you may want to test formally for differences between this groups. To do so you can use the survdiff function:

survdiff(Surv(job_tenure,status)~sex, data=weco)
Call:
survdiff(formula = Surv(job_tenure, status) ~ sex, data = weco)

        N Observed Expected (O-E)^2/E (O-E)^2/V
sex=0 296      240      279      5.38      10.6
sex=1 387      332      293      5.12      10.6

 Chisq= 10.6  on 1 degrees of freedom, p= 0.00111 

Cox proportional hazard model

Next the PS asks for the estimation of a Cox proportional hazard model. You can estimate such model as follows:

coxph<-coxph(Surv(job_tenure,status)~sex+dex+lex+lex2, data=weco)
coxph
Call:
coxph(formula = Surv(job_tenure, status) ~ sex + dex + lex + 
    lex2, data = weco)

         coef exp(coef) se(coef)      z       p
sex   0.54443   1.72362  0.08819   6.17 6.7e-10
dex  -0.09199   0.91212  0.00668 -13.77 < 2e-16
lex  -1.15855   0.31394  0.32682  -3.54 0.00039
lex2  0.04640   1.04750  0.01317   3.52 0.00042

Likelihood ratio test=217  on 4 df, p=0
n= 683, number of events= 572 

And plot the results by calling:

plot(survfit(coxph))


  1. Please send comments to hrtdmrt2@illinois.edu or srmntbr2@illinois.edu