# e-TA 17: Survival Analysis

Welcome to a new issue of e-Tutorial. This e-TA will focus on on
Duration Models (a.k.a. Survival Analysis) in the context of the PS5. ^{1}

# Data

You can download the data set, called *weco14.csv* from the Econ 508 web site. Save it in your preferred directory.

See the first section of e-TA 13 on Cubic B-Splines and Quantile Regression for description on preparing the data and saving it in Stata format.

` use weco14.dta, clear`

Next generate the variables needed:

` gen lex2 = lex^2`

# Survival Analysis

## Kaplan-Meier

In Stata, the first thing you need to do is to declare your data set as a survival-time data. You need to identify the "analysis time" variable, and the "failure" variable. The former indicates the duration of the process, while the latter indicates whether the data is censored. In the PS5 data set, "job_tenure" represents the "analysis-time" variable, i.e., the duration of the process, while "status" represents the "failure" variable, assuming values of 0 if it is censored, and 1 if it is failure.Initially we need to generate the Kaplan-Meier estimator for men and women:

` stset job_tenure, failure(status)`

` failure event: status != 0 & status < .`

obs. time interval: (0, job_tenure]

exit on or before: failure

------------------------------------------------------------------------------

683 total obs.

0 exclusions

------------------------------------------------------------------------------

683 obs. remaining, representing

572 failures in single record/single failure data

276233 total analysis time at risk, at risk from t = 0

earliest observed entry t = 0

last observed exit t = 2626

Initially you need to generate the Kaplan-Meier estimator for men and women:

`sts test sex`

` failure _d: status`

analysis time _t: job_tenure

Log-rank test for equality of survivor functions

| Events Events

sex | observed expected

------+-------------------------

0 | 240 278.74

1 | 332 293.26

------+-------------------------

Total | 572 572.00

chi2(1) = 10.64

Pr>chi2 = 0.0011

## Cox proportional hazard model

Next the PS asks for the estimation of a Cox proportional hazard model. You can estimate such model as follows:

` stcox sex dex lex lex2`

failure _d: status

analysis time _t: job_tenure

Iteration 0: log likelihood = -3251.0092

Iteration 1: log likelihood = -3143.0237

Iteration 2: log likelihood = -3142.7797

Iteration 3: log likelihood = -3142.7794

Refining estimates:

Iteration 0: log likelihood = -3142.7794

Cox regression -- Breslow method for ties

No. of subjects = 683 Number of obs = 683

No. of failures = 572

Time at risk = 276233

LR chi2(4) = 216.46

Log likelihood = -3142.7794 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

sex | 1.721651 .151811 6.16 0.000 1.448399 2.046454

dex | .9122404 .0060938 -13.75 0.000 .9003746 .9242625

lex | .3145531 .1027517 -3.54 0.000 .1658216 .5966875

lex2 | 1.047419 .013782 3.52 0.000 1.020752 1.074783

------------------------------------------------------------------------------

In the output above, a hazard ratio equals one is the benchmark: if the hazard ratio is higher than one, the variable affects positively the hazard; if the hazard ratio is less than one, the variable contributes negatively to the hazard. This can be checked by asking for the coefficients rather than the proportional hazard rates representation of the Cox model:

`stcox sex dex lex lex2, nohr`

` failure _d: status`

analysis time _t: job_tenure

Iteration 0: log likelihood = -3251.0092

Iteration 1: log likelihood = -3143.0237

Iteration 2: log likelihood = -3142.7797

Iteration 3: log likelihood = -3142.7794

Refining estimates:

Iteration 0: log likelihood = -3142.7794

Cox regression -- Breslow method for ties

No. of subjects = 683 Number of obs = 683

No. of failures = 572

Time at risk = 276233

LR chi2(4) = 216.46

Log likelihood = -3142.7794 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

sex | .5432834 .0881776 6.16 0.000 .3704585 .7161084

dex | -.0918518 .00668 -13.75 0.000 -.1049444 -.0787592

lex | -1.156602 .3266594 -3.54 0.000 -1.796843 -.5163618

lex2 | .046329 .0131581 3.52 0.000 .0205397 .0721183

------------------------------------------------------------------------------

Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu↩