Ordinal and Multinomial Logistic Regression in Excel

BESH Stat NG v0.5.8 adds two important capabilities for applied regression work in Excel.

First, it introduces ordinal logistic regression and multinomial logistic regression workflows that can be used directly from worksheet formulas. Second, it adds the new formula= argument to regression UDFs. Now, it is possible to specify richer model directly in the worksheet, including categorical factors, polynomial terms, and interaction terms.

The practical goal is simple. It is to help users stay in a familiar Excel environment and focus on the statistical analysis itself, instead of spending time learning a new language or navigating a more complex software workflow.

In this post, I will demonstrate the new functionality with two examples:

  • Ordinal logistic regression on the Rheumatoid self-assessment scores measured on a five-level ordinal response scale at three follow-up times. Data from the Lipsitz, S.R. and Kim, K. and Zhao, L. (1994). Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine, 13 , 1149–1163. arthritis5.csv
  • Multinomial logistic regression on the diabetes data from Reaven, G. M. and Miller, R. G. (1979). An attempt to define the nature of chemical diabetes using a multidimensional analysis. Diabetologia 16, 17–24. Andrews, D. F. and Herzberg, A. M. (1985). Data: A Collection of Problems from Many Fields for the Student and Research Worker, Springer-Verlag, Ch. 36. diabetes.csv

For each example, I will show both the graphical interface workflow and the UDF-based worksheet workflow.

What is new in BESH Stat NG v0.5.8

This release introduces the following regression-related additions:

  • Model tabs on all regression forms now support categorical factors, polynomial terms, and continuous-variable interaction terms
  • A new formula= argument in regression UDFs
  • A regression-formula validation helper UDF
  • batch #3 of new excel UDFs, for list of all updates see.

The main idea behind the new formula= argument is to make worksheet-based model specification more transparent and more reproducible. Instead of only passing a raw predictor matrix into a UDF and relying on column order alone, the model structure can now be stated explicitly.

For example, the following formula specifies a model with a continuous predictor, two categorical predictors, and one additional numeric predictor:

'age' + factor('sex', ref=1) + factor('trt', ref=1) + 'baseline'

The same syntax can also express polynomial and interaction terms:

'age' + 'age'^2
'glucose' + 'insulin':'sspg'
'age' + factor('baseline', ref=1)

This makes it easier to build, compare, and document regression models directly inside an Excel workbook.

The new formula= argument in regression UDFs

The new formula= parameter controls the right-hand side of the regression model.

Supported syntax

The current syntax supports:

  • Main effects
    'age' + 'baseline'
  • Polynomial terms
    'age' + 'age'^2
  • Interaction terms
    'glucose' + 'insulin':'sspg'
  • Categorical factor terms
    factor('sex', ref=1)
    factor('baseline', ref=1)

formulaAddressing options

BESH Stat NG currently supports three addressing modes for formulas:

  • "names"
    Variable names are written in single quotes, for example:

    'prison' + 'dose' + 'dose'^2
  • "relative"
    Variable references use letters relative to the supplied X range:

    A + B + C^2 + B:C
  • "absolute"
    Variable references use worksheet column letters:

    C + D + factor(F, ref=1)

In this post I mainly use "names", because it is the clearest form for teaching and for reading formulas later.

Notes and current limitations

  • factor(...) is intended for categorical main effects
  • interaction terms are currently most useful for continuous variables
  • variable names in "names" mode use single quotes
  • names containing apostrophes should escape them by doubling them, for example:
    'Children''s dose'

For full details, see the dedicated regression-formula help page in the BESH Stat NG documentation.

Example 1: Ordinal logistic regression in Excel

For the ordinal logistic regression example, I used the arthritis dataset and focused on the 5-month follow-up observations. The response variable y is ordinal, which makes ordinal logistic regression a natural modeling choice.

The first model uses:

  • age as a continuous predictor
  • sex as a categorical predictor
  • trt as a categorical predictor
  • baseline as a numeric predictor

The model can be written as:

'age' + factor('sex', ref=1) + factor('trt', ref=1) + 'baseline'

Ordinal logistic regression is used when the outcome variable has categories with a natural order, but the distances between those categories is unequal. In the arthritis example, the response variable y records ordered outcome levels. It makes sense to model higher versus lower outcome categories while preserving that ordering information.

This is different from ordinary linear regression, where the outcome is numeric, and also different from multinomial logistic regression, where outcome categories are not assumed to have any meaningful order. Ordinal logistic regression is often a good choice for questionnaire scores, ordered clinical ratings, satisfaction levels, symptom severity scales, and other outcomes where categories run from lower to higher levels.

In practical terms, the model estimates how the predictors are associated with the odds of being in a higher outcome category rather than a lower one. A positive coefficient means that larger predictor values are linked to higher outcome levels. A negative coefficient means the opposite. When odds ratios are reported, values above 1 indicate higher odds of being in a higher category. Values below 1 indicate lower odds.

Ordinal logistic regression through the graphical interface

The graphical workflow starts by selecting the outcome and the predictor variables.

Step 1: Select the outcome and predictors

In the Select Variables tab, the outcome variable is set to y, and the predictors are age, sex, trt, and baseline.

BESH Stat NG ordinal logistic regression dialog showing y as the outcome and age, sex, trt, and baseline as predictors.
The ordinal logistic regression dialog in BESH Stat NG with y as the outcome and age, sex, trt, and baseline selected as predictors.

Step 2: Build the model

In the Build Model tab, sex and trt are added as categorical factors, while age and baseline are kept as numeric terms.

BESH Stat NG build model tab showing age and baseline as numeric effects and sex and trt as categorical factors.
In the model-building tab, sex and trt are entered as categorical factors, while age and baseline remain numeric.

Step 3: Fit and interpret the model

The fitted model shows that:

  • baseline is strongly associated with the outcome
  • trt is also statistically significant
  • age is weaker in this model
  • sex is not clearly associated with the outcome in this specification

Worksheet output from BESH Stat NG ordinal logistic regression showing coefficients, standard errors, odds ratios, and model statistics.
The ordinal logistic regression output shows a clear association for baseline and trt, while age is weaker and sex is not statistically significant in this model.

This is a useful introductory example because it demonstrates how to interpret a model where not every predictor is equally important. A good first step is to look at the coefficient table together with the p-values and odds ratios. In this model, baseline and trt stand out most clearly. Their p-values are small, which suggests that these predictors are associated with the ordinal outcome even after accounting for the other variables in the model.

The odds-ratio column helps translate the coefficients into a more intuitive scale. For baseline, the odds ratio is greater than 1, which means that higher baseline values are associated with higher odds of being in a higher outcome category. In other words, patients with higher baseline scores tend to shift toward higher outcome levels in this fitted model. For trt, the odds ratio is also above 1, suggesting that the treatment-coded group has higher odds of being in a higher outcome category than the reference group, assuming the coding is interpreted in that direction.

By contrast, sex does not show clear evidence of association here, because its p-value is not small and its confidence interval for the odds ratio is wide and includes 1. That means the data in this model do not provide strong evidence that outcome levels differ systematically by sex after adjusting for the other predictors. The age effect is weaker as well: its estimated association is negative, but the p-value is not low enough to treat it as a clearly supported effect in this analysis.

It is also useful to distinguish between statistical significance and practical interpretation. A predictor with a small p-value suggests evidence of association, but the odds ratio and its confidence interval help show the direction and approximate size of that association. At the same time, a predictor with a non-significant result should not automatically be called “unimportant”; it is more accurate to say that this particular dataset and model do not provide strong evidence for an effect.

Finally, the model-level statistics tell us whether the fitted model improves on a null model with no predictors. Here the likelihood ratio test is highly significant, which indicates that the predictors taken together improve model fit. The pseudo-R2 values should be read as descriptive measures of fit rather than as direct analogues of the ordinary least-squares R2, but they can still be useful for comparing related models. Taken together, this output suggests that baseline and trt are the main drivers in this first ordinal logistic example, while age and sex contribute less clearly.

The same ordinal logistic model with worksheet UDFs

The same model can also be fitted directly in a worksheet formula.

A corresponding UDF call is:

=BESH.REGR.ORDLOGIT_FIT(C:C,D:G,D1:G1,,,, "'age' + factor('sex', ref=1) + factor('trt', ref=1) + 'baseline'", "names")

This approach makes the model structure visible in the worksheet itself, which is especially useful when building templates or sharing reproducible analyses.

Excel worksheet showing BESH.REGR.ORDLOGIT_FIT and BESH.REGR.ORDLOGIT_SUMMARY formulas with the corresponding ordinal logistic results.
The same ordinal logistic model can be fitted directly with UDFs, making the model formula explicit in the worksheet.

The GUI and UDF outputs tell the same statistical story, but the worksheet version makes the model specification fully transparent.

Extending the ordinal model with the new formula= syntax

One of the main advantages of the new formula system is that it makes it easy to try alternative model structures without leaving the worksheet.

Adding a polynomial term

For example, a quadratic age term can be added as:

'age' + 'age'^2 + factor('sex', ref=1) + factor('trt', ref=1) + 'baseline'

Worksheet output for an ordinal logistic regression model including age and age squared.
The formula= syntax makes it easy to test nonlinear terms such as age^2 directly in the worksheet.

In this example, the age-squared term does not appear to improve the model very much. This is a useful reminder that richer syntax makes more flexible models possible, but the data still determine whether the added complexity is worthwhile.

Treating baseline as a categorical factor

The same syntax also allows a coded predictor to be treated as a factor instead of as a numeric trend term:

'age' + factor('sex', ref=1) + factor('trt', ref=1) + factor('baseline', ref=1)

Or, using worksheet-column addressing:

E + factor(D, ref=1) + factor(F, ref=1) + factor(G, ref=1)

with formulaAddressing="absolute".

Worksheet output showing an ordinal logistic regression model where baseline is treated as a categorical factor using absolute worksheet-column references.
The factor(...) syntax can be used to treat a coded predictor such as baseline as a categorical factor rather than as a numeric trend.

This is one of the most useful additions in practice because many worksheet datasets contain variables that are coded with numbers, even though those numbers do not necessarily represent a true numeric scale. A variable such as baseline may be stored as 1, 2, 3, or 4, but the analyst may not want to assume that the difference between 1 and 2 has the same meaning as the difference between 3 and 4. In one analysis, it may be reasonable to treat such a variable as numeric and estimate a simple trend effect. In another, it may be more appropriate to treat the same variable as categorical and allow each level to have its own relationship with the outcome.

Before the addition of formula=, making that choice in a worksheet-based workflow often required extra manual preparation, such as recoding variables, creating dummy columns, or rearranging the predictor matrix. With the new syntax, the same source data can be used in both ways simply by changing the model formula. That means the analyst can move from 'baseline' to factor('baseline') or explicitly as factor('baseline', ref=1) without rebuilding the worksheet layout or duplicating the dataset.

This is especially convenient when comparing alternative model specifications. It becomes much easier to ask practical modeling questions such as: should this predictor be treated as an ordered numeric effect, or should each category be estimated separately? Does the simpler trend-based model capture the relationship well enough, or does the categorical representation reveal a more complex pattern? Because these alternatives can now be tested directly in the formula itself, the workbook becomes more flexible and the modeling decisions become easier to document and review.

Example 2: Multinomial logistic regression in Excel

For the multinomial logistic regression example, I used the diabetes dataset. The response has multiple categories without a natural ordering, which makes multinomial logistic regression the appropriate method.

The first model uses three predictors:

'glucose' + 'insulin' + 'sspg'

Multinomial logistic regression is used when the outcome variable has more than two categories and those categories do not have a natural ordering. In the diabetes example, the response variable separates individuals into distinct groups, but those groups are not arranged on a simple low-to-high scale. That makes multinomial logistic regression an appropriate choice.

The model works by comparing each non-reference category with a chosen reference category. Instead of estimating one single set of coefficients for the whole outcome, it estimates a separate set of regression coefficients for each category comparison. This allows the relationship between the predictors and the outcome to differ across categories, which is often important when the groups represent genuinely different clinical or biological states.

In practical terms, the coefficients describe how the predictors change the log-odds of belonging to one category rather than the reference category. When odds ratios are shown, values above 1 indicate higher odds of being in the category of interest relative to the reference category, while values below 1 indicate lower odds. Because the model is category-specific, the same predictor may have different effects in different outcome comparisons.

Multinomial logistic regression through the graphical interface

Step 1: Select the outcome and predictors

In the Select Variables tab, the outcome is set to the numeric group variable, and the predictors are rw, fpg, glucose, insulin, and sspg. For the first model, the final selected effects are glucose, insulin, and sspg.

BESH Stat NG multinomial logistic regression dialog showing the diabetes worksheet with the groupn outcome and predictor candidates selected.
In the multinomial logistic regression dialog, groupn is used as the outcome and the predictor variables are selected from the worksheet.

Step 2: Build the model

In the Build Model tab, the selected effects are set to glucose, insulin, and sspg.

BESH Stat NG build model tab for multinomial logistic regression showing glucose, insulin, and sspg as selected effects.
The initial multinomial logistic model uses glucose, insulin, and sspg as predictors.

Step 3: Review options

The Options tab allows control over convergence settings, residual computation, covariance matrix output, and the reference category.

BESH Stat NG multinomial logistic regression options tab showing convergence settings, residual options, and reference-category selection.
The options tab provides control over convergence settings and the reference category used in the multinomial model.

Step 4: Fit and interpret the model

The first multinomial model shows a clearly significant overall fit. It also provides a useful teaching example because some predictors are more informative than others, and the coefficient tables are split by category comparison.

Worksheet output from BESH Stat NG multinomial logistic regression showing category-specific coefficients, odds ratios, and model statistics.
The multinomial logistic output reports separate coefficient tables for each non-reference category and includes likelihood-based model statistics.

This first multinomial logistic model gives a good example of how to read category-specific regression output. A useful place to begin is with the overall likelihood ratio test, because it tells us whether the fitted model improves on a model with no predictors. In this example, the test is clearly significant, which indicates that the predictors taken together help distinguish between the outcome categories.

The next step is to read the coefficient tables one category comparison at a time. In multinomial logistic regression, each non-reference category is compared with the chosen reference category, so the same predictor can have a different estimated effect in different comparisons. A positive coefficient means that higher values of the predictor are associated with higher odds of being in that category rather than the reference category, while a negative coefficient means lower odds. When odds ratios are reported, values above 1 indicate increased odds of membership in the category of interest relative to the reference category, and values below 1 indicate decreased odds.

In this model, some predictors appear more informative than others. For example, glucose shows one of the clearest associations in at least one of the category comparisons, while other predictors are weaker or less consistent across the output. That is a common pattern in multinomial models: a variable may be strongly related to one category comparison but not to another. Because of this, it is important not to summarize the whole model with a single statement about each predictor without first checking the category-specific results.

As with other regression models, p-values, odds ratios, and confidence intervals should be interpreted together. A small p-value suggests evidence of association for that specific category comparison, while the odds ratio gives the direction and approximate size of the effect. A wide confidence interval, especially one that includes 1, suggests greater uncertainty. In practice, this first multinomial example shows that the model is useful overall and that at least some of the predictors help distinguish between the diabetes groups, while also illustrating that predictor importance can vary from one category comparison to another.

The same multinomial logistic model with worksheet UDFs

The same model can also be created directly with a UDF:

=BESH.REGR.MNLOGIT_FIT(G:G,A:E,A1:E1,,,, "'glucose' + 'insulin' + 'sspg'", "names",,0.000001)

This makes the selected predictors explicit in the worksheet formula itself.

Excel worksheet showing BESH.REGR.MNLOGIT_FIT and BESH.REGR.MNLOGIT_SUMMARY formulas with the corresponding multinomial regression results.
The same multinomial logistic model can be fitted directly from worksheet formulas using the new regression UDF workflow.

Again, the worksheet formula makes the model easier to inspect and reuse. Note that there are more function related to each regression model than presented here. For the full list see documentation.

A richer multinomial model using polynomial and interaction terms

One of the main new capabilities in this release is the ability to specify richer regression models directly inside the worksheet formula.

For example:

'glucose' + 'insulin' + 'sspg' + 'rw' + 'fpg' + 'fpg'^2 + 'insulin':'sspg'

This extended model includes:

  • main effects
  • a polynomial term (fpg^2)
  • an interaction term (insulin:sspg)

Worksheet output showing a multinomial logistic regression model with additional predictors, a squared fpg term, and an insulin by sspg interaction.
The new formula= syntax allows richer multinomial models to be expressed directly in a worksheet, including polynomial and interaction terms.

This extended example is useful primarily as a demonstration of what the new formula= syntax can express. It shows that the worksheet formula can now describe a much richer model structure directly inside Excel.

In practice, more elaborate models should be checked carefully for coefficient stability, standard errors, and substantive plausibility. In this extended multinomial example, there are warning signs that the fitted model may be too ambitious for the amount of information available in the data. Some coefficients are very large in magnitude, some standard errors are extremely large, and at least one reported odds ratio is effectively infinite. When an odds ratio appears as Inf (or is so large that it is practically unbounded), that usually means the fitted coefficient is extremely large and the model is struggling to estimate that effect in a stable way.

Results like these often suggest that the model is close to a separation problem or that some combination of predictors is making one outcome category almost perfectly distinguishable from another. In that situation, the software may still return estimates, but those estimates should not be interpreted in the same way as a stable, well-behaved model. Very large standard errors and extremely wide confidence intervals are a sign that the data do not support precise estimation of those effects, even if the fitted model formally converges.

From a practical point of view, this is a reminder that richer syntax makes more complex models possible, but it does not remove the need for statistical judgment. When polynomial or interaction terms lead to unstable coefficients, infinite odds ratios, or implausibly large effect estimates, it is often worth stepping back to a simpler specification and asking whether the additional terms are really supported by the data. In this example, the extended model is still valuable as a demonstration of what the new formula= syntax can express, but it should also be read as a lesson in checking model stability rather than as a final substantive model to interpret without caution.

GUI workflows and UDF workflows in practice

BESH Stat NG now supports both styles of work comfortably.

Use the graphical interface when

  • you want guided model setup
  • exploring different model structures
  • results quickly with a few clicks
  • you prefer selecting variables and effects interactively

Use the UDFs when

  • you want transparent, reproducible models in the worksheet
  • formulas visible directly in cells
  • to compare multiple model specifications side by side
  • to build reusable workbook templates

The new formula= argument makes the UDF path especially convenient for users who want the model specification to remain visible and editable inside the workbook.

Final remarks

BESH Stat NG v0.5.8 extends Excel-based regression analysis in two important ways.

It introduces a more expressive way to specify regression models through the new formula= argument. This means that factors, polynomial terms, and interaction terms can now be incorporated directly into worksheet-based regression UDFs, while similar model-building options are now also available through the graphical forms.

For users who already work comfortably in Excel, this makes it easier to move from simple models to richer regression specifications without changing analysis environments.

If you would like to try the new release, download BESH Stat NG here:

Download:
https://beshstat.eu/download-ng/

For more details, see the BESH Stat NG documentation, including the pages for:

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.