Poisson regression (Generalized linear model GLM)

Poisson regression is fundamental to the modeling of count data. It is the simplest count regression model; and similar to regular multiple linear regression except that the dependent variable is an observed count that follows the Poisson distribution. The Poisson distribution is specified with a single parameter μ – the mean incidence rate. The Poisson distribution has also the property that its mean and variance are equal.

Example

We will use the medpar data (The US national Medicare inpatient hospital database data for Arizona in 1991) as an example to run the Poisson regression using BESH Stat. You can download the source data in csv format here.

We will use following variables from the dataset:

los – length of hospital stay (how many days was patient registered in a hospital for a specific disease)
hmo – Patients belongs to a Health Maintenance Organization
white – Patient identifies themselves as Caucasian
type2 – Urgent admission
type3 – Elective admission

and fit the model los = hmo + white + type2 + type3.

Note that that data do not have 0 counts (check the minimum value for the los variable in the descriptive statistics table), and also check the histogram indicating that data are skewed.

	los
Valid data	1495
Mean	9.854181
Median	8
SD	8.832906
SEM	0.228446
Variance	78.02022
Coefficient of Variation	0.896361
Skewness	3.651526
Kurtosis	29.20186
Q1	4
Q3	13
IQR	9
Minimum	1
Maximum	116
Range	115

To fit the Poisson model open the medpar.csv and select the BESH Stat main menu on the Excel Add-ins tab > Regression models > Poisson regression. On the Select Variables tab put los as a Response (Outcome) variable and hmo, white, type2, and type3 as a predictor variable(s). Then on the Model specification tab select all 4 predictor variables as an effects. You don’t need to change anything on the Settings tab (this procedure uses log link function by default and it cannot be changed; use the Generalized linear model userform (coming soon in BESH stat) if you need more settings.); and hit OK button to run the analysis.

A new excel workbook will be created with two sheets named – Poisson Regression containing the parameter estimates, standard errors, z statistic, p-value and 95% confidence intervals; and Model analysis table. And the Predictions and Residuals sheet.

Model Analysis	Poisson
Link Function	Log
Null deviance	8901.134077
Residual deviance	8142.666001
Full Log Likelihood	-6928.907786
Deviance G² (likelihood ratio) chisq	758.4680756
df	4
p-value	7.5998E-163
Deviance goodness of fit chisq	8142.666001
df	1490
p-value	0
Pseudo (McFadden) R-square	0.085210274
Pearson goodness of fit chisq	9327.983216
df	1490
p-value	0
AIC	13867.81557
AICc	13867.85587
BIC	13894.36498
Over-dispersion scale parameter	6.26039142
Number of Iterations	4
Relative Log-Likelihood Change	2.91038E-11
Converged?	TRUE

Parameter estimates:

Variable	Coefficient	Std. Error	Z	P-value	95% CI Lower	95% CI Upper
Intercept	2.3329	0.0272	85.743	0	2.2796	2.3863
hmo	-0.0715	0.0239	-2.988	0.0028	-0.1185	-0.0246
white	-0.1539	0.0274	-5.613	1.98717E-08	-0.2076	-0.1001
type2	0.2217	0.0211	10.529	0	0.1804	0.2629
type3	0.7095	0.0261	27.146	0	0.6583	0.7607

Interpretation of parameter estimates (Coefficient column in the BESH Stat output) in Poisson regression is in general the change in the log-count (note the log is a link function in the Poisson regression) of the response for a one-unit change in the predictor. For a binary predictor (all predictors in our example are binary), it is the change in the log-count value of the response when the value of the predictor changes from 0 to 1. For example in case of type2 variable in our model, the Urgent admission increase the log-number of length of hospital stay by 0.2217 compared with a non-Urgent admission at its mean (or e^0.2217 = 1.248 when expressed the length of hospital stay in number of days instead of log-days).

The model without the adjusted SEs had all p-values as statistically significant. BESH Stat results match the R glm results (see the R code and output at the bottom of this post). Recall that the data do not have 0 counts. The assumption of the Poisson model is that response variable is a count and zero counts must be allowed as a possibility even though there may not be any in a given modeling situation. However it is not the case in our example because the length of the hospital stay is always at least 1 day. This situation often leads to overdispersion. Note the high value of dispersion statistic (Over-dispersion scale parameter) at 6.26 (value 1 indicates dispersion as assumed by the model variance). (The Pearson dispersion parameter is calculated as the Pearson goodness of fit Chi2 statistic divided by degrees of freedom.)

Dispersion corrected estimates are automatically computed by BESH stat. It uses a scaled standard error calculated as square root of dispersion parameter multiplied by the original standard errors (The R glm function with family=quasipoisson use the scaling method described here. See the R output at the bottom of the post.). When scaling standard errors, the Pearson dispersion statistic better captures the excess variability in the data, adjusting model standard errors in such a manner as to reflect what the standard errors would be if the excess variability were not present in the data¹. Scaled standard errors calculated for this model:

Variable	Coefficient	Std. Error	Z	P-value	95% CI Lower	95% CI Upper
Intercept	2.3329	0.0681	34.269	2.8105E-190	2.1994	2.4665
hmo	-0.0715	0.0599	-1.194	0.2326	-0.1891	0.0460
white	-0.1539	0.0686	-2.243	0.0250	-0.2884	-0.0193
type2	0.2217	0.0527	4.208	2.72982E-05	0.1183	0.3250
type3	0.7095	0.0654	10.849	1.9021E-26	0.5812	0.8378

Note that the parameter estimates remain unaffected when standard errors are scaled. Scaling is an easy, quick-and-dirty method of adjusting standard errors for overdispersion¹. It would be more appropriate to fit the negative binomial model, or zero-truncated model instead. The NB2 negative binomial model can be understand as an extension of Poisson model (i.e. Poisson-gamma mixture) with additional parameter (dispersion parameter) that can better handle over-dispersed data. Negative binomial NB2 regression is provided by BESH Stat starting version 0.21. See how to fit Negative Binomial model using BESH Stat in this post.

R code to fit the same model on medpar data:

library(msme)
data(medpar)
attach(medpar)
m1<- glm(los ~ hmo + white + factor(type), family=poisson, data=medpar)
summary(m1)

Call:
glm(formula = los ~ hmo + white + factor(type), family = poisson, data = medpar)

Coefficients:
               Estimate Std. Error  z value  Pr(>|z|) 
(Intercept)    2.33293     0.02721   85.744  < 2e-16 ***
hmo            -0.07155    0.02394   -2.988  0.00281 ** 
white          -0.15387    0.02741   -5.613  1.99e-08 ***
factor(type)2   0.22165    0.02105   10.529  < 2e-16 ***
factor(type)3   0.70948    0.02614   27.146  < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 8901.1 on 1494 degrees of freedom
Residual deviance: 8142.7 on 1490 degrees of freedom
AIC: 13868
Number of Fisher Scoring iterations: 5

m2<-glm(los ~ hmo + white + factor(type), family=quasipoisson)
summary(m2)

Call:
glm(formula = los ~ hmo + white + factor(type), family = quasipoisson)

Coefficients:
              Estimate  Std. Error  t value  Pr(>|t|) 
(Intercept)    2.33293     0.06808   34.269  < 2e-16 ***
hmo           -0.07155     0.05991   -1.194  0.233 
white         -0.15387     0.06859   -2.243  0.025 * 
factor(type)2  0.22165     0.05267    4.208  2.73e-05 ***
factor(type)3  0.70948     0.06539   10.849  < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasipoisson family taken to be 6.260405)

Null deviance: 8901.1 on 1494 degrees of freedom
Residual deviance: 8142.7 on 1490 degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5

References

Joseph M. Hilbe. Modeling Count Data. Cambridge University Press, 2014

Poisson regression (Generalized linear model GLM)

Example

References

Leave a comment

Cancel reply