Negative binomial regression is used for modeling count variables, usually for over-dispersed count outcome variables (e.g. accounting for the extra-Poisson variation). BESH Stats fits negative binomial regression with the traditional NB2[1] parametrization (see [2] chapter 8 for NB2 model derivation), where it can be derived as a a Poisson-gamma mixture with variance equal to μ + αμ2, where μ is the Poisson variance and αμ2 is the gamma variance and α is called the dispersion parameter (or alternatively heterogeneity parameter or overdispersion parameter). If the data are not overdispersed, i.e. the data are Poisson, then α = 0.
Example:
We will use the same example as on the recent Poisson regression post and fit the same model but this time assuming negative binomial distribution of the response. You can download the source data in csv format here and use the following variables to replicate below results on your own.
- los – length of hospital stay (how many days was patient registered in a hospital for a specific disease)
- hmo – Patients belongs to a Health Maintenance Organization
- white – Patient identifies themselves as Caucasian
- type2 – Urgent admission
- type3 – Elective admission
The fitted model is los = hmo + white + type2 + type3.
To fit the model open the medpar.csv and select the BESH Stat main menu on the Excel Add-ins tab > Regression models > Negative Binomial regression. On the Select Variables tab put los as a Response (Outcome) variable and hmo, white, type2, and type3 as a predictor variable(s). Then on the Model specification tab select all 4 predictor variables as an effects. You don’t need to change anything on the Settings tab (this procedure uses log link function by default and it cannot be changed; and hit OK button to run the analysis.
A new excel workbook will be created with two sheets named – Negative Binomial Regression containing the parameter estimates, standard errors, z statistic, p-value and 95% confidence intervals; and Model analysis table. And the Predictions and Residuals sheet.
Model Analysis | Negative Binomial (NB2) |
Link Function | Log |
Null deviance | 1691.050716 |
Residual deviance | 1568.14286 |
Full Log Likelihood | -4797.476603 |
Deviance G² (likelihood ratio) chisq | 122.9078557 |
df | 4 |
p-value | 1.27779E-25 |
Deviance goodness of fit chisq | 1568.14286 |
df | 1490 |
p-value | 0.077899802 |
Pseudo (McFadden) R-square | 0.072681354 |
Pearson goodness of fit chisq | 1624.538253 |
df | 1490 |
p-value | 0.008060318 |
AIC | 9606.953205 |
AICc | 9606.993474 |
BIC | 9638.812494 |
Dispersion | 1.09029413 |
Variance function V(u)= | u+(0.4457567)u^2 |
Number of Iterations | 3 |
Last Relative Deviance + Dispersion Change | 5.66629E-13 |
Converged? | TRUE |
Parameter estimates:
Variable | Coefficient | Std. Error | Z | P-value | 95% CI Lower | 95% CI Upper |
Intercept | 2.31027898 | 0.067446766 | 34.25336899 | 0 | 2.178085747 | 2.442472213 |
hmo | -0.06795522 | 0.053213752 | -1.277023656 | 0.201593896 | -0.172252258 | 0.036341817 |
white | -0.129065489 | 0.068362722 | -1.887951292 | 0.05903249 | -0.263053961 | 0.004922984 |
type2 | 0.221248963 | 0.050457469 | 4.384860464 | 1.1606E-05 | 0.12235414 | 0.320143786 |
type3 | 0.706158808 | 0.075998501 | 9.291746551 | 0 | 0.557204483 | 0.855113132 |
The interpretation of parameter estimates (the coefficient column in the Parameter estimates table) is the same as for the Poisson model because NB2 model fitted by BESH Stat uses log link function. For a binary predictor (all predictors in our example are binary), it is the change in the log-count value of the response when the value of the predictor changes from 0 to 1. For example in case of type2 variable in our model, the Urgent admission increase the log-number of length of hospital stay by 0.2212 compared with a non-Urgent admission at its mean (or e0.2212 = 1.25 when expressed the length of hospital stay in number of days instead of log-days).
Recall the Poisson regression fit where we observed a Pearson dispersion parameter equal 6.26 indicating overdispersion. The negative binomial is a substantial improvement over the Poisson model, because the dispersion has been reduced to 1.09 (see dispersion row in the Model analysis table), indicating that the data still have a little more variability than is accounted for using a negative binomial model, but much less compared to Poisson. From the parameter estimate table we see that hmo is clearly not significant and that white likely is not either as opposed to Poisson model output (even the Poisson model with the scaled standard errors showed the white variable parameter being significant).
The parameter estimates and standard errors provided by BESH Stat match results provided by R (see output at the bottom of the post). Note that dispersion parameter α as estimated by BESH Stat equal to 0.4457567 (see line “Variance function V(u)=” in the Model analysis table) is equal to 1/0.4457567 = 2.24337626 as estimated by R Theta: 2.2434.
To compare Poisson and Negative binomial model NB2 we can also use Akaike information criterion (AIC) provided in the Model Analysis table. To compare models using AIC note that the lower AIC scores are better, and AIC penalizes models that use more parameters. From the BESH Stat output we see that NB2 AIC = 9607 is lower then then Poisson model AIC = 13868. Based on the AIC we can conclude that NB2 model is prefereble.
R code to fit the same model on medpar data:
library(MASS)
library(msme)
data(medpar)
m1 <- glm.nb(los ~ hmo+ white + type2 + type3, data=medpar)
summary(m1)
Call: glm.nb(formula = los ~ hmo + white + type2 + type3, data = medpar, init.theta = 2.243376203, link = log) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.31028 0.06745 34.253 < 2e-16 *** hmo -0.06796 0.05321 -1.277 0.202 white -0.12907 0.06836 -1.888 0.059 . type2 0.22125 0.05046 4.385 1.16e-05 *** type3 0.70616 0.07600 9.292 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Negative Binomial(2.2434) family taken to be 1) Null deviance: 1691.1 on 1494 degrees of freedom Residual deviance: 1568.1 on 1490 degrees of freedom AIC: 9607 Number of Fisher Scoring iterations: 1 Theta: 2.2434 Std. Err.: 0.0997 2 x log-likelihood: -9594.9530
References:
- Cameron, A. C. and P.K. Trivedi (1986). Econometric models based on count data: Comparisons and applications of some estimators, Journal of Applied Econometrics 1: 29–53.
- J. Hilbe, Negative Binomial Regression, 2nd ed., New York: Cambridge University Press, 2011.