Poisson regression (Generalized linear model GLM)

Poisson regression is fundamental to the modeling of count data. It is the simplest count regression model; and similar to regular multiple linear regression except that the dependent variable is an observed count that follows the Poisson distribution. The Poisson distribution is specified with a single parameter μ – the mean incidence rate. The Poisson distribution has also the property that its mean and variance are equal.

Example

We will use the medpar data (The US national Medicare inpatient hospital database data for Arizona in 1991) as an example to run the Poisson regression using BESH Stat. You can download the source data in csv format here.

We will use following variables from the dataset:

  • los – length of hospital stay (how many days was patient registered in a hospital for a specific disease)
  • hmo – Patients belongs to a Health Maintenance Organization
  • white – Patient identifies themselves as Caucasian
  • type2 – Urgent admission
  • type3 – Elective admission

and fit the model los = hmo + white + type2 + type3.

Note that that data do not have 0 counts (check the minimum value for the los variable in the descriptive statistics table), and also check the histogram indicating that data are skewed.

los
Valid data1495
Mean9.854181
Median8
SD8.832906
SEM0.228446
Variance78.02022
Coefficient of Variation0.896361
Skewness3.651526
Kurtosis29.20186
Q14
Q313
IQR9
Minimum1
Maximum116
Range115

To fit the Poisson model open the medpar.csv and select the BESH Stat main menu on the Excel Add-ins tab > Regression models > Poisson regression. On the Select Variables tab put los as a Response (Outcome) variable and hmo, white, type2, and type3 as a predictor variable(s). Then on the Model specification tab select all 4 predictor variables as an effects. You don’t need to change anything on the Settings tab (this procedure uses log link function by default and it cannot be changed; use the Generalized linear model userform (coming soon in BESH stat) if you need more settings.); and hit OK button to run the analysis.

A new excel workbook will be created with two sheets named – Poisson Regression containing the parameter estimates, standard errors, z statistic, p-value and 95% confidence intervals; and Model analysis table. And the Predictions and Residuals sheet.

Model AnalysisPoisson
Link FunctionLog
Null deviance8901.134077
Residual deviance8142.666001
Full Log Likelihood-6928.907786
Deviance G² (likelihood ratio) chisq758.4680756
df4
p-value7.5998E-163
Deviance goodness of fit chisq8142.666001
df1490
p-value0
Pseudo (McFadden) R-square0.085210274
Pearson goodness of fit chisq9327.983216
df1490
p-value0
AIC13867.81557
AICc13867.85587
BIC13894.36498
Over-dispersion scale parameter6.26039142
Number of Iterations4
Relative Log-Likelihood Change2.91038E-11
Converged?TRUE

Parameter estimates:

VariableCoefficientStd. ErrorZP-value95% CI Lower95% CI Upper
Intercept2.33290.027285.74302.27962.3863
hmo-0.07150.0239-2.9880.0028-0.1185-0.0246
white-0.15390.0274-5.6131.98717E-08-0.2076-0.1001
type20.22170.021110.52900.18040.2629
type30.70950.026127.14600.65830.7607

Interpretation of parameter estimates (Coefficient column in the BESH Stat output) in Poisson regression is in general the change in the log-count (note the log is a link function in the Poisson regression) of the response for a one-unit change in the predictor. For a binary predictor (all predictors in our example are binary), it is the change in the log-count value of the response when the value of the predictor changes from 0 to 1. For example in case of type2 variable in our model, the Urgent admission increase the log-number of length of hospital stay by 0.2217 compared with a non-Urgent admission at its mean (or e0.2217 = 1.248 when expressed the length of hospital stay in number of days instead of log-days).

The model without the adjusted SEs had all p-values as statistically significant. BESH Stat results match the R glm results (see the R code and output at the bottom of this post). Recall that the data do not have 0 counts. The assumption of the Poisson model is that response variable is a count and zero counts must be allowed as a possibility even though there may not be any in a given modeling situation. However it is not the case in our example because the length of the hospital stay is always at least 1 day. This situation often leads to overdispersion. Note the high value of dispersion statistic (Over-dispersion scale parameter) at 6.26 (value 1 indicates dispersion as assumed by the model variance). (The Pearson dispersion parameter is calculated as the Pearson goodness of fit Chi2 statistic divided by degrees of freedom.)

Dispersion corrected estimates are automatically computed by BESH stat. It uses a scaled standard error calculated as square root of dispersion parameter multiplied by the original standard errors (The R glm function with family=quasipoisson use the scaling method described here. See the R output at the bottom of the post.). When scaling standard errors, the Pearson dispersion statistic better captures the excess variability in the data, adjusting model standard errors in such a manner as to reflect what the standard errors would be if the excess variability were not present in the data1. Scaled standard errors calculated for this model:

VariableCoefficientStd. ErrorZP-value95% CI Lower95% CI Upper
Intercept2.33290.068134.2692.8105E-1902.19942.4665
hmo-0.07150.0599-1.1940.2326-0.18910.0460
white-0.15390.0686-2.2430.0250-0.2884-0.0193
type20.22170.05274.2082.72982E-050.11830.3250
type30.70950.065410.8491.9021E-260.58120.8378

Note that the parameter estimates remain unaffected when standard errors are scaled. Scaling is an easy, quick-and-dirty method of adjusting standard errors for overdispersion1. It would be more appropriate to fit the negative binomial model, or zero-truncated model instead. The NB2 negative binomial model can be understand as an extension of Poisson model (i.e. Poisson-gamma mixture) with additional parameter (dispersion parameter) that can better handle over-dispersed data. Negative binomial NB2 regression is provided by BESH Stat starting version 0.21. See how to fit Negative Binomial model using BESH Stat in this post.

 

R code to fit the same model on medpar data:

library(msme)
data(medpar)
attach(medpar)
m1<- glm(los ~ hmo + white + factor(type), family=poisson, data=medpar)
summary(m1)

Call:
glm(formula = los ~ hmo + white + factor(type), family = poisson, data = medpar)

Coefficients:
               Estimate Std. Error  z value  Pr(>|z|) 
(Intercept)    2.33293     0.02721   85.744  < 2e-16 ***
hmo            -0.07155    0.02394   -2.988  0.00281 ** 
white          -0.15387    0.02741   -5.613  1.99e-08 ***
factor(type)2   0.22165    0.02105   10.529  < 2e-16 ***
factor(type)3   0.70948    0.02614   27.146  < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 8901.1 on 1494 degrees of freedom
Residual deviance: 8142.7 on 1490 degrees of freedom
AIC: 13868
Number of Fisher Scoring iterations: 5

m2<-glm(los ~ hmo + white + factor(type), family=quasipoisson)
summary(m2) 
Call:
glm(formula = los ~ hmo + white + factor(type), family = quasipoisson)

Coefficients:
              Estimate  Std. Error  t value  Pr(>|t|) 
(Intercept)    2.33293     0.06808   34.269  < 2e-16 ***
hmo           -0.07155     0.05991   -1.194  0.233 
white         -0.15387     0.06859   -2.243  0.025 * 
factor(type)2  0.22165     0.05267    4.208  2.73e-05 ***
factor(type)3  0.70948     0.06539   10.849  < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasipoisson family taken to be 6.260405)

Null deviance: 8901.1 on 1494 degrees of freedom
Residual deviance: 8142.7 on 1490 degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5
References
  1. Joseph M. Hilbe. Modeling Count Data. Cambridge University Press, 2014

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.