Regression and UDF workflows in Excel

Linear regression, Poisson GLM, and repeated-measures GEE using workbook-based Excel workflows

BESHStatNG makes it possible to run a broad range of regression workflows directly in Excel, from standard linear models to generalized linear models and repeated-measures models with generalized estimating equations.

This tutorial introduces three practical workflows:

  • Multiple Linear Regression (LM) for a continuous outcome
  • Generalized Linear Model (GLM) for count data with a Poisson log-link and offset
  • Generalized Estimating Equations (GEE) for repeated count data with within-subject correlation

The examples are built around real treatment-study datasets and are designed to show both sides of the BESHStatNG workflow:

  • the GUI-based analysis
  • the worksheet UDF workflow using BESH.REGR.* formulas

The goal is not only to show where to click, but also how to organize reusable Excel workbooks for modeling, checking results, and building template-based analyses.

Download the tutorial workbooks

This tutorial uses two workbook examples.

1. Linear regression workbook

[Download the LM workbook]

This workbook uses a treatment-study dataset with pre-treatment and post-treatment body weight measurements. It demonstrates an ANCOVA-style linear regression workflow with:

  • Postwt as the dependent variable
  • treatment group as a categorical predictor
  • baseline weight (Prewt) as a covariate

2. GLM and GEE workbook

[Download the GLM and GEE workbook]

This workbook uses repeated seizure-count data from an epilepsy treatment study. It demonstrates:

  • Poisson GLM with log link and offset
  • Poisson GEE with exchangeable working correlation and robust standard errors

Each workbook contains:

  • raw data
  • cleaned analysis data
  • BESHStatNG UDF formulas
  • reference result sheets
  • GUI placeholder sheets for screenshots and final output discussion

Why these examples were chosen

These two workbook examples show three distinct modeling situations that Excel users often need in practice:

  • LM when the outcome is continuous
  • GLM when the outcome is a count and an offset is needed
  • GEE when repeated measurements from the same subject must be modeled without assuming independence

Together, they show how BESHStatNG supports both:

  • one-time dialog-driven analysis
  • reusable formula-driven workbook design

That is one of the strongest reasons to use BESHStatNG instead of relying only on ad hoc Excel workflows.

Part I — Multiple Linear Regression (LM)

The first example uses a treatment-study dataset with three treatment groups:

  • Cont
  • CBT
  • FT

The tutorial model uses:

  • Dependent variable: Postwt
  • Predictors: TreatCode and Prewt

This is an ANCOVA-style model: post-treatment weight is modeled as a function of baseline weight and treatment group.

That makes this a good first regression tutorial because it answers a simple practical question:

After adjusting for baseline weight, do the treatment groups differ in post-treatment weight?

Running the linear model in BESHStatNG

Open:

BESH Stat NG → Analyse → Regression → Multiple Linear Regression (LM)

Use the Analysis_Data worksheet from the workbook.

Set:

  • Outcome: Postwt
  • Predictors: TreatCode, Prewt

Then go to the Build Model tab and define:

  • TreatCode as a categorical factor
  • Prewt as a continuous predictor
  • include the intercept

In the Options tab for this example:

  • use Type III Sum-of-Squares
  • compute residuals

 

BESH Stat NG ribbon showing the Regression menu and Multiple Linear Regression (LM).

 

LM input dialog using Postwt as the response and TreatCode plus Prewt as predictors.

 

LM build model screen with TreatCode treated as a categorical factor and Prewt as a continuous effect.

 

LM options showing Type III sums of squares and residual computation.

Interpreting the linear model

The workbook results show the following coefficient estimates:

  • Intercept: 45.6740, p = 0.00095
  • TreatCode[2] (CBT vs Cont): 4.0971, p = 0.0340
  • TreatCode[3] (FT vs Cont): 8.6601, p = 0.00019
  • Prewt: 0.4345, p = 0.00885

Model-level results:

  • N = 72
  • R-squared = 0.2777
  • Adjusted R-squared = 0.2458
  • F-statistic = 8.7126
  • Model p-value = 0.000057

Practical interpretation

This model suggests that, after adjusting for baseline weight:

  • the CBT group has a significantly higher post-treatment weight than the control group
  • the FT group has an even larger positive difference relative to control
  • higher baseline weight is also associated with higher post-treatment weight

In plain language, both treatment effects are positive after controlling for starting weight, and the FT effect is the strongest of the two.

LM workbook and UDF workflow

The LM workbook also includes a formula-based modeling workflow using BESHStatNG UDFs.

On the BESH_UDFs sheet, the model is built with (see documentation):

  • BESH.REGR.FORMULA_VALIDATE
  • BESH.REGR.LM_FIT
  • BESH.REGR.LM_SUMMARY
  • BESH.REGR.LM_TESTS
  • BESH.REGR.LM_ANOVA
  • BESH.REGR.LM_PRED

This makes it possible to keep the analysis logic directly in worksheet cells rather than only inside GUI dialogs.

That is particularly useful when you want to build Excel-based reporting templates or re-run the same analysis structure on updated data.

Part II — Generalized Linear Model (GLM)

The second example uses repeated seizure-count data from an epilepsy treatment study.

For the GLM example, each two-week observation is treated as a separate count outcome, and the model includes:

  • Dependent variable: SeizureCount
  • Predictors: TreatDrug, Age, V4
  • Offset: LogBase4Offset

This is a Poisson regression with log link.

The offset is included to account for baseline seizure exposure:
log(Base8wk / 4)

This is a useful tutorial example because it shows how BESHStatNG handles a practical count-data workflow in Excel.

Running the Poisson GLM in BESHStatNG

Open:

BESH Stat NG → Analyse → Regression → Generalized Linear Models (GLM)

Use the Analysis_Data worksheet from the workbook.

Set:

  • Outcome: SeizureCount
  • Offset: LogBase4Offset
  • Predictors: TreatDrug, Age, V4

In the Build Model tab:

  • set TreatDrug as a categorical factor
  • keep Age and V4 as numeric effects

In the Options tab:

  • Family: Poisson
  • Link: Log

 

BESH Stat NG ribbon showing the Generalized Linear Models (GLM) option.

 

GLM input dialog using seizure count as outcome, offset, and three predictors.

 

GLM build model screen with TreatDrug as a categorical factor and Age plus V4 as numeric effects.

 

GLM options showing Poisson family and Log link.

Interpreting the Poisson GLM

The workbook GLM results are:

  • Intercept: -0.2059, p = 0.0702
  • TreatDrug[1]: -0.0625, p = 0.1828
  • Age: 0.0121, p = 0.0010
  • V4: -0.1611, p = 0.0032

Exponentiated effects:

  • TreatDrug IRR: 0.939
  • Age IRR: 1.012
  • V4 IRR: 0.851

Model-level values:

  • N = 236
  • Df Model = 3
  • LogLik = -881.7162
  • Deviance = 997.4375
  • Pearson Chi-square = 1097.0760
  • AIC = 1771.4324

Practical interpretation

In this GLM:

  • TreatDrug is not statistically significant
  • Age has a small positive association with expected seizure count
  • V4 is significantly negative, meaning the count is lower when V4 = 1

The incidence-rate-ratio interpretation is often the clearest here:

  • an IRR below 1 means a lower expected count
  • an IRR above 1 means a higher expected count

The negative V4 coefficient corresponds to an IRR of about 0.85, which means a lower expected seizure count for that condition.

Why move from GLM to GEE?

The GLM treats each row as an independent observation.

But in this dataset, repeated rows belong to the same subject, which means within-subject correlation is likely. That violates the independence assumption of a standard Poisson GLM.

That is why the next step is important:
use GEE to model the repeated measurements directly.

Part III — Generalized Estimating Equations (GEE)

The GEE example uses the same epilepsy dataset, but now the repeated measurements are modeled as correlated observations within subject.

The tutorial model uses:

  • Outcome: SeizureCount
  • Offset: LogBase4Offset
  • Cluster ID: SubjectID
  • Within-cluster ordering: Period
  • Predictors: TreatDrug, Age, V4

And the options are:

  • Family: Poisson
  • Link: Log
  • Covariance structure: Exchangeable
  • Standard error: Robust

This is a very useful example because it shows how to move from a simple count model to a repeated-measures marginal model while staying in Excel.

Running the GEE model in BESHStatNG

Open:

BESH Stat NG → Analyse → Regression → Generalized Estimating Equations (GEE)

Use the Analysis_Data worksheet from the workbook.

Set:

  • Outcome: SeizureCount
  • Offset: LogBase4Offset
  • Cluster ID: SubjectID
  • Within-cluster ordering: Period
  • Predictors: TreatDrug, Age, V4

In the Build Model tab:

  • treat TreatDrug as a categorical factor
  • keep Age and V4 as numeric

In the Options tab:

  • Family: Poisson
  • Link: Log
  • Covariance structure: Exchangeable
  • Standard Err.: Robust
  • alpha: 0.050
BESH Stat NG ribbon showing the Generalized Estimating Equations (GEE) option.

 

GEE input dialog with seizure count, offset, subject ID, within-cluster ordering, and predictors.

 

GEE build model screen with TreatDrug as a categorical predictor and Age plus V4 as numeric effects.

 

GEE options using Poisson family, log link, exchangeable correlation, and robust covariance.

Interpreting the GEE model

The BESHStatNG GEE results in the workbook are:

  • Intercept: -0.2925, p = 0.4458
  • TreatDrug[1]: 0.2578, p = 0.1167
  • Age: 0.01436, p = 0.2471
  • V4: -0.18073, p = 0.00137

Model-level values:

  • Family: Poisson
  • Link: Log
  • Dependence structure: Exchangeable
  • Covariance type: Robust
  • Observations: 236
  • Clusters: 59
  • Scale: 4.4391
  • QIC: -5686.1380
  • QICu: -5759.7868
  • Iterations: 257
  • Converged: TRUE
  • Working correlation: 0.4661

Practical interpretation

The most important result here is:

  • V4 remains statistically significant
  • TreatDrug is not statistically significant
  • Age is also not statistically significant

Compared with the GLM, the GEE changes the estimated treatment effect and its uncertainty because it accounts for within-subject correlation.

That is a major teaching point for this tutorial:

When repeated observations belong to the same subject, a model that assumes independence may not tell the same story as a repeated-measures model.

GLM versus GEE: what changed?

This example is useful because the GLM and GEE do not give exactly the same treatment effect.

GLM

  • TreatDrug coefficient = -0.0625
  • p = 0.1828

GEE

  • TreatDrug coefficient = 0.2578
  • p = 0.1167

The direction and magnitude of the treatment coefficient shift when within-subject correlation is modeled explicitly.

That is not a mistake.
It reflects a different modeling question:

  • GLM gives a model under independence assumptions
  • GEE gives a marginal repeated-measures model with correlation taken into account

For repeated clinical outcomes, the GEE result is often the more appropriate interpretation.

Formula-driven regression workflows with BESHStatNG

One of the strongest parts of BESHStatNG is that the same analysis can also be structured directly in worksheet formulas.

These workbooks include formula-based modeling sheets built with:

LM workbook

  • BESH.REGR.FORMULA_VALIDATE
  • BESH.REGR.LM_FIT
  • BESH.REGR.LM_SUMMARY
  • BESH.REGR.LM_TESTS
  • BESH.REGR.LM_ANOVA
  • BESH.REGR.LM_PRED

GLM workbook

  • BESH.REGR.FORMULA_VALIDATE
  • BESH.REGR.GLM_FIT
  • BESH.REGR.GLM_SUMMARY
  • BESH.REGR.GLM_TESTS
  • BESH.REGR.GLM_PRED

GEE workbook

  • BESH.REGR.FORMULA_VALIDATE
  • BESH.REGR.GEE_FIT
  • BESH.REGR.GEE_SUMMARY
  • BESH.REGR.GEE_TESTS
  • BESH.REGR.GEE_WCORR
  • BESH.REGR.GEE_PRED

This makes it possible to create reusable Excel templates where:

  • the data can be replaced
  • the formulas remain fixed
  • the model output updates directly inside the workbook

For reporting and teaching, that is a major advantage.

Compare the Excel results with R

The workbook results can also be checked in R.

For LM and GLM, the agreement with R should usually be very close, apart from normal rounding differences.

For GEE, small differences may appear across software implementations. These can come from:

  • convergence tolerances
  • iteration limits
  • covariance updates
  • working-correlation estimation details
  • small-sample corrections
  • scale handling

For this tutorial, the BESHStatNG GUI results should be treated as the primary reference for the GEE example.

R reference — LM example

Use this script to reproduce the linear model from the LM workbook.

library(readxl)
dat <- read_excel(
"BESHStatNG_regression_tutorial_lm_anorexia.xlsx",
sheet = "Analysis_Data"
dat <- as.data.frame(dat)
dat$TreatLabel <- factor(dat$TreatLabel, levels = c("Cont", "CBT", "FT"))
fit_lm <- lm(Postwt ~ TreatLabel + Prewt, data = dat)
summary(fit_lm)
confint(fit_lm)
anova(fit_lm)

Expected values from the workbook:

  • CBT vs Cont: 4.0971, p = 0.0340
  • FT vs Cont: 8.6601, p = 0.00019
  • Prewt: 0.4345, p = 0.00885
  • R-squared: 0.2777

R reference — Poisson GLM example

Use this script to reproduce the Poisson GLM from the epilepsy workbook.

library(readxl)
dat <- read_excel("BESHStatNG_regression_tutorial_glm_gee_epilepsy.xlsx", sheet = "Analysis_Data")
dat <- as.data.frame(dat)
dat$TreatDrug <- factor(dat$TreatDrug, levels = c(0, 1))
fit_glm <- glm(SeizureCount ~ TreatDrug + Age + V4,data = dat,family = poisson(link = "log"),offset = LogBase4Offset)
summary(fit_glm)
confint.default(fit_glm)
exp(coef(fit_glm))

Expected values from the workbook:

  • TreatDrug[1]: -0.0625, p = 0.1828
  • Age: 0.0121, p = 0.0010
  • V4: -0.1611, p = 0.0032

R reference — Poisson GEE example

Use this script to reproduce the GEE model as closely as possible in R.

Important note: for the GEE example, exact agreement between R and BESHStatNG may not be perfect. Small differences are expected across implementations because of differences in convergence tolerances, covariance updates, working-correlation estimation, scale handling, and quasi-likelihood calculations.

For this tutorial, the BESHStatNG GUI results in the workbook should be treated as the primary reference when discussing the GEE output.

library(readxl)
library(geepack)
dat <- read_excel("BESHStatNG_regression_tutorial_glm_gee_epilepsy.xlsx",sheet = "Analysis_Data")
dat <- as.data.frame(dat)
dat$TreatDrug <- factor(dat$TreatDrug, levels = c(0, 1))
fit_gee <- geeglm(SeizureCount ~ TreatDrug + Age + V4,
data = dat, id = SubjectID,waves = Period, family = poisson(link = "log"),
corstr = "exchangeable", offset = LogBase4Offset, std.err = "san.se")
summary(fit_gee)

Important note for the tutorial page:

For the GEE example, exact agreement between R and BESHStatNG may not be perfect. Small differences are expected across implementations of:

  • working-correlation updates
  • robust covariance estimation
  • convergence criteria
  • scale and quasi-likelihood calculations

For this reason, the BESHStatNG output in the workbook should be used as the main reference when discussing the GEE results.

Expected BESHStatNG GEE values from this tutorial:

  • TreatDrug[1]: 0.2578, p = 0.1167
  • Age: 0.01436, p = 0.2471
  • V4: -0.18073, p = 0.00137
  • Working correlation: 0.4661

What this tutorial shows

This tutorial demonstrates three distinct but connected regression workflows in Excel:

  • LM for continuous outcomes
  • GLM for count outcomes
  • GEE for repeated count outcomes

It also shows a second important theme:

BESHStatNG is not only a dialog-driven add-in.
It also supports a formula-based workbook workflow, which makes it much easier to build reusable statistical templates in Excel.

That combination is one of the strongest practical advantages of BESHStatNG.