Welcome to a new issue of e-Tutorial. This e-TA will focus on on Duration Models (a.k.a. Survival Analysis) in the context of the PS5. 1
You can download the data set, called weco14.txt from the Econ 536 web site. Save it in your preferred directory.
Then you can load it in R. If you saved it in your hard drive in a folder called econ536 you can load it by typing:
weco<-read.table("C:/econ536/weco14.txt", header=T, sep="")
or you can retrieve it from the web
weco<-read.table("http://www.econ.uiuc.edu/~econ536/Data/weco14.txt", header=T, sep="")
head(weco)
y sex dex lex kwit job_tenure status treatment ypost
1 13.73 0 38 10 FALSE 277 TRUE TRUE 14.35
2 17.15 1 55 11 TRUE 173 TRUE NA NA
3 13.63 1 45 12 FALSE 410 TRUE TRUE 15.75
4 13.04 1 41 11 FALSE 247 TRUE FALSE 18.33
5 13.20 1 42 10 FALSE 340 TRUE FALSE 13.96
6 16.43 0 38 12 FALSE 517 TRUE TRUE 18.54
Next generate the variables needed:
weco$lex2<-weco$lex^2
To do this kind of analysis in R
we are going to use the package survival
require(survival)
Loading required package: survival
Then we need to identify the “analysis time” variable, and the “failure” variable. The former indicates the duration of the process, while the latter indicates whether the data is censored. In the PS5 data set, “tenure” represents the “analysis-time” variable, i.e., the duration of the process, while “status” represents the “failure” variable, assuming values of 0 if it is censored, and 1 if it is failure.
Initially we need to generate the Kaplan-Meier estimator for men and women:
fit <- survfit(Surv(job_tenure,status)~sex, data=weco)
plot(fit, lty=c(1,3), xlab="Time", ylab="Survival Probability")
legend(2000,0.7, legend=c("Female", "Male"), lty=c(1,3) )
Next you may want to test formally for differences between this groups. To do so you can use the survdiff
function:
survdiff(Surv(job_tenure,status)~sex, data=weco)
Call:
survdiff(formula = Surv(job_tenure, status) ~ sex, data = weco)
N Observed Expected (O-E)^2/E (O-E)^2/V
sex=0 296 240 279 5.38 10.6
sex=1 387 332 293 5.12 10.6
Chisq= 10.6 on 1 degrees of freedom, p= 0.00111
Next the PS asks for the estimation of a Cox proportional hazard model. You can estimate such model as follows:
coxph<-coxph(Surv(job_tenure,status)~sex+dex+lex+lex2, data=weco)
coxph
Call:
coxph(formula = Surv(job_tenure, status) ~ sex + dex + lex +
lex2, data = weco)
coef exp(coef) se(coef) z p
sex 0.54443 1.72362 0.08819 6.17 6.7e-10
dex -0.09199 0.91212 0.00668 -13.77 < 2e-16
lex -1.15855 0.31394 0.32682 -3.54 0.00039
lex2 0.04640 1.04750 0.01317 3.52 0.00042
Likelihood ratio test=217 on 4 df, p=0
n= 683, number of events= 572
And plot the results by calling:
plot(survfit(coxph))
Please send comments to hrtdmrt2@illinois.edu or srmntbr2@illinois.edu↩