Welcome to a new issue of e-Tutorial, where we’ll focus on Count Data models, with special focus to Poisson and Negative Binomial regression. ¹

Data

You can download the data set, called health.dta from the Econ 508 web site. Save it in your preferred directory.

The data set used here comes from A. Colin Cameron and Per Johansson, “Count Data Regression Using Series Expansion: With Applications”, Journal of Applied Econometrics, Vol. 12, No. 3, 1997, pp. 203-224.

The data is in STATA format, and you can download it from the Econ 508 web site. According to the authors, the data set is based on the 1977-78 Australian Health Survey. It contains 5190 observations on the following variables:

NONDOCCO: Number of consultations in the past four week with non-doctor health professionals (chemist, optician, physiotherapist, etc.)
SEX Gender of patient (female=1)
AGE Age of patient (in years)
INCOME Patient’s annual income (in hundreds of dollars)
LEVYPLUS Dummy for private insurance coverage (=1)
FREEPOOR Dummy for free government insurance coverage due to low income (=1)
FREEREPA Dummy for free government insurance coverage due to old age, disability, or veteran status (=1)
ILLNESS Number of illnesses in past two weeks
ACTDAYS Number of days of reduced activity in past two weeks due to illness or injury
HSCORE Health questionnaire score (high score=bad health)
CHCOND1 Dummy for chronic condition not limiting activity (=1)
CHCOND2 Dummy for chronic condition limiting activity (=1)

According to the authors, the data is overdispersed (the sample mean of the dependent variable is 0.215 and the sample variance of it is 0.932). This might be an indicator that the Poisoness property (mean equals variance) may be violated, and a Negative Binomial Regression might be necessary.

The aim of the e-TA will be to try to reproduce Cameron and Johansson (1997) main results.

First we load the data.

   use health.dta, clear

Next we generate the variable age-squared:

   gen AGE2 = AGE^2

Finally you can run the generalized linear models of your choice. Here we will focus on how to run Poisson Regression and Negative Binomial Regression Models

Poisson Regression

   poisson NONDOCCO SEX AGE AGE2 INCOME LEVYPLUS FREEPOOR FREEREPA ILLNESS ACTDAYS HSCORE CHCOND1 CHCOND2

Iteration 0:   log likelihood = -4205.1938  
Iteration 1:   log likelihood = -3382.4299  
Iteration 2:   log likelihood = -3112.8639  
Iteration 3:   log likelihood = -3109.3816  
Iteration 4:   log likelihood = -3109.3722  
Iteration 5:   log likelihood = -3109.3722  

Poisson regression                                Number of obs   =       5190
                                                  LR chi2(12)     =    1086.94
                                                  Prob > chi2     =     0.0000
Log likelihood = -3109.3722                       Pseudo R2       =     0.1488

------------------------------------------------------------------------------
    NONDOCCO |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         SEX |   .3316144   .0696475     4.76   0.000     .1951079    .4681209
         AGE |  -3.307805   1.228064    -2.69   0.007    -5.714766   -.9008437
        AGE2 |    4.39034    1.30202     3.37   0.001     1.838428    6.942252
      INCOME |  -.0352855   .1113467    -0.32   0.751    -.2535211    .1829501
    LEVYPLUS |   .3278973    .097685     3.36   0.001     .1364383    .5193564
    FREEPOOR |   .0154283   .2110056     0.07   0.942    -.3981352    .4289917
    FREEREPA |   .4820755   .1160326     4.15   0.000     .2546559    .7094952
     ILLNESS |    .054726   .0215542     2.54   0.011     .0124806    .0969714
     ACTDAYS |   .0979188   .0061003    16.05   0.000     .0859625    .1098751
      HSCORE |   .0447936   .0116531     3.84   0.000     .0219539    .0676332
     CHCOND1 |   .5186225    .087033     5.96   0.000      .348041     .689204
     CHCOND2 |   1.078644   .0983912    10.96   0.000     .8858014    1.271488
       _cons |  -2.443619   .2401184   -10.18   0.000    -2.914242   -1.972995
------------------------------------------------------------------------------

You can check the "Poisoness" property by typing:

   poisgof

         Deviance goodness-of-fit =  5040.935
         Prob > chi2(5177)        =    0.9103

         Pearson goodness-of-fit  =  15332.47
         Prob > chi2(5177)        =    0.0000

The null hypothesis (of Poisoness) can not be rejected in the test above, meaning that a Poisson Regression is fine for this data. Nevertheless, below we explore how to compute the Negative Binomial Regression anyway.

Negative Binomial Regression

You can run a Negative Binomial Regression as follows:

   nbreg NONDOCCO SEX AGE AGE2 INCOME LEVYPLUS FREEPOOR FREEREPA ILLNESS ACTDAYS HSCORE CHCOND1 CHCOND2

Fitting Poisson model:

Iteration 0:   log likelihood = -4205.1938  
Iteration 1:   log likelihood = -3382.4299  
Iteration 2:   log likelihood = -3112.8639  
Iteration 3:   log likelihood = -3109.3816  
Iteration 4:   log likelihood = -3109.3722  
Iteration 5:   log likelihood = -3109.3722  

Fitting constant-only model:

Iteration 0:   log likelihood =  -2940.014  (not concave)
Iteration 1:   log likelihood = -2313.0946  
Iteration 2:   log likelihood =  -2312.632  
Iteration 3:   log likelihood =  -2312.632  

Fitting full model:

Iteration 0:   log likelihood = -2214.1623  
Iteration 1:   log likelihood = -2205.8669  
Iteration 2:   log likelihood = -2161.0568  
Iteration 3:   log likelihood = -2160.4958  
Iteration 4:   log likelihood = -2160.4953  
Iteration 5:   log likelihood = -2160.4953  

Negative binomial regression                      Number of obs   =       5190
                                                  LR chi2(12)     =     304.27
Dispersion     = mean                             Prob > chi2     =     0.0000
Log likelihood = -2160.4953                       Pseudo R2       =     0.0658

------------------------------------------------------------------------------
    NONDOCCO |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         SEX |   .2308069   .1246663     1.85   0.064    -.0135347    .4751484
         AGE |  -2.675554   2.430949    -1.10   0.271    -7.440127     2.08902
        AGE2 |   3.854298    2.62543     1.47   0.142    -1.291451    9.000047
      INCOME |  -.0621359   .1906016    -0.33   0.744    -.4357083    .3114364
    LEVYPLUS |   .2986752   .1583588     1.89   0.059    -.0117024    .6090528
    FREEPOOR |   -.196811   .3515133    -0.56   0.576    -.8857645    .4921425
    FREEREPA |   .5877179   .2185872     2.69   0.007     .1592949    1.016141
     ILLNESS |   .1443791   .0467823     3.09   0.002     .0526875    .2360708
     ACTDAYS |   .1370558   .0170753     8.03   0.000     .1035888    .1705228
      HSCORE |   .0739655   .0279625     2.65   0.008     .0191601    .1287709
     CHCOND1 |   .4115285    .142977     2.88   0.004     .1312987    .6917583
     CHCOND2 |   1.124148   .1830997     6.14   0.000     .7652796    1.483017
       _cons |  -2.783845   .4351314    -6.40   0.000    -3.636687   -1.931003
-------------+----------------------------------------------------------------
    /lnalpha |   2.187067   .0758164                      2.038469    2.335664
-------------+----------------------------------------------------------------
       alpha |    8.90904   .6754515                      7.678844    10.33632
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0:  chibar2(01) = 1897.75 Prob>=chibar2 = 0.000

As we know the negative binomial models does not assume that conditional means are equal to the conditional variances. This inequality is captured by estimating a dispersion parameter (alpha), where alpha = 0 corresponds to the 'equidispersion' case (i.e. poisson). What does the test suggest?

Please send comments to bottan2@illinois.edu or srmntbr2@illinois.edu ↩

Contact	Office Hours	E-mail
Prof. Roger Koenker	M. & W. 2:30-3:30 or by appointment (126 DKH)	rkoenker@illinois.edu
TA Nicolas Bottan	TBA	bottan2@illinois.edu

Applied Econometrics
Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

e-TA 16: Count Data Models

Data

Poisson Regression

Negative Binomial Regression

Contact

Office Hours

E-mail

Applied Econometrics Econ 508 - Fall 2014

Professor: Roger Koenker

TA: Nicolas Bottan

e-TA 16: Count Data Models

Data

Poisson Regression

Negative Binomial Regression

Contact

Office Hours

E-mail

Applied Econometrics
Econ 508 - Fall 2014