Linear regression, Poisson GLM, and repeated-measures GEE using workbook-based Excel workflows
BESHStatNG makes it possible to run a broad range of regression workflows directly in Excel, from standard linear models to generalized linear models and repeated-measures models with generalized estimating equations.
This tutorial introduces three practical workflows:
- Multiple Linear Regression (LM) for a continuous outcome
- Generalized Linear Model (GLM) for count data with a Poisson log-link and offset
- Generalized Estimating Equations (GEE) for repeated count data with within-subject correlation
The examples are built around real treatment-study datasets and are designed to show both sides of the BESHStatNG workflow:
- the GUI-based analysis
- the worksheet UDF workflow using
BESH.REGR.*formulas
The goal is not only to show where to click, but also how to organize reusable Excel workbooks for modeling, checking results, and building template-based analyses.
Download the tutorial workbooks
This tutorial uses two workbook examples.
1. Linear regression workbook
This workbook uses a treatment-study dataset with pre-treatment and post-treatment body weight measurements. It demonstrates an ANCOVA-style linear regression workflow with:
Postwtas the dependent variable- treatment group as a categorical predictor
- baseline weight (
Prewt) as a covariate
2. GLM and GEE workbook
[Download the GLM and GEE workbook]
This workbook uses repeated seizure-count data from an epilepsy treatment study. It demonstrates:
- Poisson GLM with log link and offset
- Poisson GEE with exchangeable working correlation and robust standard errors
Each workbook contains:
- raw data
- cleaned analysis data
- BESHStatNG UDF formulas
- reference result sheets
- GUI placeholder sheets for screenshots and final output discussion
Why these examples were chosen
These two workbook examples show three distinct modeling situations that Excel users often need in practice:
- LM when the outcome is continuous
- GLM when the outcome is a count and an offset is needed
- GEE when repeated measurements from the same subject must be modeled without assuming independence
Together, they show how BESHStatNG supports both:
- one-time dialog-driven analysis
- reusable formula-driven workbook design
That is one of the strongest reasons to use BESHStatNG instead of relying only on ad hoc Excel workflows.
Part I — Multiple Linear Regression (LM)
The first example uses a treatment-study dataset with three treatment groups:
- Cont
- CBT
- FT
The tutorial model uses:
- Dependent variable:
Postwt - Predictors:
TreatCodeandPrewt
This is an ANCOVA-style model: post-treatment weight is modeled as a function of baseline weight and treatment group.
That makes this a good first regression tutorial because it answers a simple practical question:
After adjusting for baseline weight, do the treatment groups differ in post-treatment weight?
Running the linear model in BESHStatNG
Open:
BESH Stat NG → Analyse → Regression → Multiple Linear Regression (LM)
Use the Analysis_Data worksheet from the workbook.
Set:
- Outcome:
Postwt - Predictors:
TreatCode,Prewt
Then go to the Build Model tab and define:
TreatCodeas a categorical factorPrewtas a continuous predictor- include the intercept
In the Options tab for this example:
- use Type III Sum-of-Squares
- compute residuals




LM options showing Type III sums of squares and residual computation.
Interpreting the linear model
The workbook results show the following coefficient estimates:
- Intercept: 45.6740, p = 0.00095
- TreatCode[2] (CBT vs Cont): 4.0971, p = 0.0340
- TreatCode[3] (FT vs Cont): 8.6601, p = 0.00019
- Prewt: 0.4345, p = 0.00885
Model-level results:
- N = 72
- R-squared = 0.2777
- Adjusted R-squared = 0.2458
- F-statistic = 8.7126
- Model p-value = 0.000057
Practical interpretation
This model suggests that, after adjusting for baseline weight:
- the CBT group has a significantly higher post-treatment weight than the control group
- the FT group has an even larger positive difference relative to control
- higher baseline weight is also associated with higher post-treatment weight
In plain language, both treatment effects are positive after controlling for starting weight, and the FT effect is the strongest of the two.
LM workbook and UDF workflow
The LM workbook also includes a formula-based modeling workflow using BESHStatNG UDFs.
On the BESH_UDFs sheet, the model is built with (see documentation):
BESH.REGR.FORMULA_VALIDATEBESH.REGR.LM_FITBESH.REGR.LM_SUMMARYBESH.REGR.LM_TESTSBESH.REGR.LM_ANOVABESH.REGR.LM_PRED
This makes it possible to keep the analysis logic directly in worksheet cells rather than only inside GUI dialogs.
That is particularly useful when you want to build Excel-based reporting templates or re-run the same analysis structure on updated data.
Part II — Generalized Linear Model (GLM)
The second example uses repeated seizure-count data from an epilepsy treatment study.
For the GLM example, each two-week observation is treated as a separate count outcome, and the model includes:
- Dependent variable:
SeizureCount - Predictors:
TreatDrug,Age,V4 - Offset:
LogBase4Offset
This is a Poisson regression with log link.
The offset is included to account for baseline seizure exposure:
log(Base8wk / 4)
This is a useful tutorial example because it shows how BESHStatNG handles a practical count-data workflow in Excel.
Running the Poisson GLM in BESHStatNG
Open:
BESH Stat NG → Analyse → Regression → Generalized Linear Models (GLM)
Use the Analysis_Data worksheet from the workbook.
Set:
- Outcome:
SeizureCount - Offset:
LogBase4Offset - Predictors:
TreatDrug,Age,V4
In the Build Model tab:
- set
TreatDrugas a categorical factor - keep
AgeandV4as numeric effects
In the Options tab:
- Family: Poisson
- Link: Log




Interpreting the Poisson GLM
The workbook GLM results are:
- Intercept: -0.2059, p = 0.0702
- TreatDrug[1]: -0.0625, p = 0.1828
- Age: 0.0121, p = 0.0010
- V4: -0.1611, p = 0.0032
Exponentiated effects:
- TreatDrug IRR: 0.939
- Age IRR: 1.012
- V4 IRR: 0.851
Model-level values:
- N = 236
- Df Model = 3
- LogLik = -881.7162
- Deviance = 997.4375
- Pearson Chi-square = 1097.0760
- AIC = 1771.4324
Practical interpretation
In this GLM:
TreatDrugis not statistically significantAgehas a small positive association with expected seizure countV4is significantly negative, meaning the count is lower whenV4 = 1
The incidence-rate-ratio interpretation is often the clearest here:
- an IRR below 1 means a lower expected count
- an IRR above 1 means a higher expected count
The negative V4 coefficient corresponds to an IRR of about 0.85, which means a lower expected seizure count for that condition.
Why move from GLM to GEE?
The GLM treats each row as an independent observation.
But in this dataset, repeated rows belong to the same subject, which means within-subject correlation is likely. That violates the independence assumption of a standard Poisson GLM.
That is why the next step is important:
use GEE to model the repeated measurements directly.
Part III — Generalized Estimating Equations (GEE)
The GEE example uses the same epilepsy dataset, but now the repeated measurements are modeled as correlated observations within subject.
The tutorial model uses:
- Outcome:
SeizureCount - Offset:
LogBase4Offset - Cluster ID:
SubjectID - Within-cluster ordering:
Period - Predictors:
TreatDrug,Age,V4
And the options are:
- Family: Poisson
- Link: Log
- Covariance structure: Exchangeable
- Standard error: Robust
This is a very useful example because it shows how to move from a simple count model to a repeated-measures marginal model while staying in Excel.
Running the GEE model in BESHStatNG
Open:
BESH Stat NG → Analyse → Regression → Generalized Estimating Equations (GEE)
Use the Analysis_Data worksheet from the workbook.
Set:
- Outcome:
SeizureCount - Offset:
LogBase4Offset - Cluster ID:
SubjectID - Within-cluster ordering:
Period - Predictors:
TreatDrug,Age,V4
In the Build Model tab:
- treat
TreatDrugas a categorical factor - keep
AgeandV4as numeric
In the Options tab:
- Family: Poisson
- Link: Log
- Covariance structure: Exchangeable
- Standard Err.: Robust
- alpha: 0.050




Interpreting the GEE model
The BESHStatNG GEE results in the workbook are:
- Intercept: -0.2925, p = 0.4458
- TreatDrug[1]: 0.2578, p = 0.1167
- Age: 0.01436, p = 0.2471
- V4: -0.18073, p = 0.00137
Model-level values:
- Family: Poisson
- Link: Log
- Dependence structure: Exchangeable
- Covariance type: Robust
- Observations: 236
- Clusters: 59
- Scale: 4.4391
- QIC: -5686.1380
- QICu: -5759.7868
- Iterations: 257
- Converged: TRUE
- Working correlation: 0.4661
Practical interpretation
The most important result here is:
- V4 remains statistically significant
TreatDrugis not statistically significantAgeis also not statistically significant
Compared with the GLM, the GEE changes the estimated treatment effect and its uncertainty because it accounts for within-subject correlation.
That is a major teaching point for this tutorial:
When repeated observations belong to the same subject, a model that assumes independence may not tell the same story as a repeated-measures model.
GLM versus GEE: what changed?
This example is useful because the GLM and GEE do not give exactly the same treatment effect.
GLM
TreatDrugcoefficient = -0.0625- p = 0.1828
GEE
TreatDrugcoefficient = 0.2578- p = 0.1167
The direction and magnitude of the treatment coefficient shift when within-subject correlation is modeled explicitly.
That is not a mistake.
It reflects a different modeling question:
- GLM gives a model under independence assumptions
- GEE gives a marginal repeated-measures model with correlation taken into account
For repeated clinical outcomes, the GEE result is often the more appropriate interpretation.
Formula-driven regression workflows with BESHStatNG
One of the strongest parts of BESHStatNG is that the same analysis can also be structured directly in worksheet formulas.
These workbooks include formula-based modeling sheets built with:
LM workbook
BESH.REGR.FORMULA_VALIDATEBESH.REGR.LM_FITBESH.REGR.LM_SUMMARYBESH.REGR.LM_TESTSBESH.REGR.LM_ANOVABESH.REGR.LM_PRED
GLM workbook
BESH.REGR.FORMULA_VALIDATEBESH.REGR.GLM_FITBESH.REGR.GLM_SUMMARYBESH.REGR.GLM_TESTSBESH.REGR.GLM_PRED
GEE workbook
BESH.REGR.FORMULA_VALIDATEBESH.REGR.GEE_FITBESH.REGR.GEE_SUMMARYBESH.REGR.GEE_TESTSBESH.REGR.GEE_WCORRBESH.REGR.GEE_PRED
This makes it possible to create reusable Excel templates where:
- the data can be replaced
- the formulas remain fixed
- the model output updates directly inside the workbook
For reporting and teaching, that is a major advantage.
Compare the Excel results with R
The workbook results can also be checked in R.
For LM and GLM, the agreement with R should usually be very close, apart from normal rounding differences.
For GEE, small differences may appear across software implementations. These can come from:
- convergence tolerances
- iteration limits
- covariance updates
- working-correlation estimation details
- small-sample corrections
- scale handling
For this tutorial, the BESHStatNG GUI results should be treated as the primary reference for the GEE example.
R reference — LM example
Use this script to reproduce the linear model from the LM workbook.
library(readxl)
dat <- read_excel(
"BESHStatNG_regression_tutorial_lm_anorexia.xlsx",
sheet = "Analysis_Data"
dat <- as.data.frame(dat)
dat$TreatLabel <- factor(dat$TreatLabel, levels = c("Cont", "CBT", "FT"))
fit_lm <- lm(Postwt ~ TreatLabel + Prewt, data = dat)
summary(fit_lm)
confint(fit_lm)
anova(fit_lm)Expected values from the workbook:
- CBT vs Cont: 4.0971, p = 0.0340
- FT vs Cont: 8.6601, p = 0.00019
- Prewt: 0.4345, p = 0.00885
- R-squared: 0.2777
R reference — Poisson GLM example
Use this script to reproduce the Poisson GLM from the epilepsy workbook.
library(readxl)
dat <- read_excel("BESHStatNG_regression_tutorial_glm_gee_epilepsy.xlsx", sheet = "Analysis_Data")
dat <- as.data.frame(dat)
dat$TreatDrug <- factor(dat$TreatDrug, levels = c(0, 1))
fit_glm <- glm(SeizureCount ~ TreatDrug + Age + V4,data = dat,family = poisson(link = "log"),offset = LogBase4Offset)
summary(fit_glm)
confint.default(fit_glm)
exp(coef(fit_glm))Expected values from the workbook:
- TreatDrug[1]: -0.0625, p = 0.1828
- Age: 0.0121, p = 0.0010
- V4: -0.1611, p = 0.0032
R reference — Poisson GEE example
Use this script to reproduce the GEE model as closely as possible in R.
Important note: for the GEE example, exact agreement between R and BESHStatNG may not be perfect. Small differences are expected across implementations because of differences in convergence tolerances, covariance updates, working-correlation estimation, scale handling, and quasi-likelihood calculations.
For this tutorial, the BESHStatNG GUI results in the workbook should be treated as the primary reference when discussing the GEE output.
library(readxl)
library(geepack)
dat <- read_excel("BESHStatNG_regression_tutorial_glm_gee_epilepsy.xlsx",sheet = "Analysis_Data")
dat <- as.data.frame(dat)
dat$TreatDrug <- factor(dat$TreatDrug, levels = c(0, 1))
fit_gee <- geeglm(SeizureCount ~ TreatDrug + Age + V4,
data = dat, id = SubjectID,waves = Period, family = poisson(link = "log"),
corstr = "exchangeable", offset = LogBase4Offset, std.err = "san.se")
summary(fit_gee)Important note for the tutorial page:
For the GEE example, exact agreement between R and BESHStatNG may not be perfect. Small differences are expected across implementations of:
- working-correlation updates
- robust covariance estimation
- convergence criteria
- scale and quasi-likelihood calculations
For this reason, the BESHStatNG output in the workbook should be used as the main reference when discussing the GEE results.
Expected BESHStatNG GEE values from this tutorial:
TreatDrug[1]: 0.2578, p = 0.1167Age: 0.01436, p = 0.2471V4: -0.18073, p = 0.00137- Working correlation: 0.4661
What this tutorial shows
This tutorial demonstrates three distinct but connected regression workflows in Excel:
- LM for continuous outcomes
- GLM for count outcomes
- GEE for repeated count outcomes
It also shows a second important theme:
BESHStatNG is not only a dialog-driven add-in.
It also supports a formula-based workbook workflow, which makes it much easier to build reusable statistical templates in Excel.
That combination is one of the strongest practical advantages of BESHStatNG.