logo

Applied Econometrics 
Econ 508 - Fall 2014

Professor: Roger Koenker 

TA: Nicolas Bottan 

Welcome to a new issue of e-Tutorial. This e-TA will focus on on Duration Models (a.k.a. Survival Analysis) in the context of the PS5. 1

Data

You can download the data set, called weco14.csv from the Econ 508 web site. Save it in your preferred directory.

See the first section of e-TA 13 on Cubic B-Splines and Quantile Regression for description on preparing the data and saving it in Stata format.

   use weco14.dta, clear

Next generate the variables needed:

   gen lex2 = lex^2

Survival Analysis

Kaplan-Meier

In Stata, the first thing you need to do is to declare your data set as a survival-time data. You need to identify the "analysis time" variable, and the "failure" variable. The former indicates the duration of the process, while the latter indicates whether the data is censored. In the PS5 data set, "job_tenure" represents the "analysis-time" variable, i.e., the duration of the process, while "status" represents the "failure" variable, assuming values of 0 if it is censored, and 1 if it is failure.

Initially we need to generate the Kaplan-Meier estimator for men and women:

   stset job_tenure, failure(status)
     failure event:  status != 0 & status < .
obs. time interval: (0, job_tenure]
exit on or before: failure

------------------------------------------------------------------------------
683 total obs.
0 exclusions
------------------------------------------------------------------------------
683 obs. remaining, representing
572 failures in single record/single failure data
276233 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 2626

Initially you need to generate the Kaplan-Meier estimator for men and women:


Next you may want to test formally for differences between this groups. 
   sts test sex
         failure _d:  status
analysis time _t: job_tenure


Log-rank test for equality of survivor functions

| Events Events
sex | observed expected
------+-------------------------
0 | 240 278.74
1 | 332 293.26
------+-------------------------
Total | 572 572.00

chi2(1) = 10.64
Pr>chi2 = 0.0011

Cox proportional hazard model

Next the PS asks for the estimation of a Cox proportional hazard model. You can estimate such model as follows:

   stcox sex dex lex lex2

failure _d: status
analysis time _t: job_tenure

Iteration 0: log likelihood = -3251.0092
Iteration 1: log likelihood = -3143.0237
Iteration 2: log likelihood = -3142.7797
Iteration 3: log likelihood = -3142.7794
Refining estimates:
Iteration 0: log likelihood = -3142.7794

Cox regression -- Breslow method for ties

No. of subjects = 683 Number of obs = 683
No. of failures = 572
Time at risk = 276233
LR chi2(4) = 216.46
Log likelihood = -3142.7794 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex | 1.721651 .151811 6.16 0.000 1.448399 2.046454
dex | .9122404 .0060938 -13.75 0.000 .9003746 .9242625
lex | .3145531 .1027517 -3.54 0.000 .1658216 .5966875
lex2 | 1.047419 .013782 3.52 0.000 1.020752 1.074783
------------------------------------------------------------------------------

In the output above, a hazard ratio equals one is the benchmark: if the hazard ratio is higher than one, the variable affects positively the hazard; if the hazard ratio is less than one, the variable contributes negatively to the hazard. This can be checked by asking for the coefficients rather than the proportional hazard rates representation of the Cox model:

   stcox sex dex lex lex2, nohr
         failure _d:  status
analysis time _t: job_tenure

Iteration 0: log likelihood = -3251.0092
Iteration 1: log likelihood = -3143.0237
Iteration 2: log likelihood = -3142.7797
Iteration 3: log likelihood = -3142.7794
Refining estimates:
Iteration 0: log likelihood = -3142.7794

Cox regression -- Breslow method for ties

No. of subjects = 683 Number of obs = 683
No. of failures = 572
Time at risk = 276233
LR chi2(4) = 216.46
Log likelihood = -3142.7794 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex | .5432834 .0881776 6.16 0.000 .3704585 .7161084
dex | -.0918518 .00668 -13.75 0.000 -.1049444 -.0787592
lex | -1.156602 .3266594 -3.54 0.000 -1.796843 -.5163618
lex2 | .046329 .0131581 3.52 0.000 .0205397 .0721183
------------------------------------------------------------------------------

  1. Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu