Survival Analysis in Excel with BESH Stat NG: Kaplan–Meier, Log-Rank, and Cox Regression

This tutorial shows how to analyze time-to-event data (survival analysis) in Excel with BESH Stat NG using a real medical dataset. We compare clinics with Kaplan–Meier curves and the log-rank test, then fit a Cox proportional hazards model with clinic, prison record, and methadone dose as predictors.

Survival analysis is one of the most useful statistical tools in medical research, especially when the question is not just whether an event happens, but when it happens. In many real studies, some participants experience the event during follow-up, while others do not. That is exactly the kind of situation where standard methods such as simple means or ordinary linear regression stop being appropriate.

In this tutorial, I will use BESH Stat NG to analyze a classic methadone maintenance dataset in Excel. The example is practical rather than theoretical: we will compare patient retention between two clinics using Kaplan–Meier curves and the log-rank test, and then move to Cox proportional hazards regression to adjust for additional predictors.

The goal is simple: show how medical researchers and students can do a serious survival analysis workflow in Excel using both the Windows-form interface and the survival UDFs available in BESH Stat NG.

You can download BESH Stat NG, explore the Kaplan–Meier documentation, the log-rank test documentation, the Cox regression documentation, and the survival UDF documentation.

Example dataset

This example uses a dataset from a study by Caplehorn et al. on methadone maintenance treatment coxdata.csv. Each row represents one patient. The key variables are:

  • ID: subject identifier
  • clinic: clinic 1 or clinic 2
  • status: 0 = censored, 1 = departed from clinic
  • survt: observed time in days
  • prison: 0 = no prison record, 1 = any prison record
  • dose: methadone dose in mg/day

A quick but important note: in this article, “survival” does not mean survival in the mortality sense. Here, survival means remaining in treatment, and the event is departure from the clinic.

Excel worksheet showing the methadone clinic dataset with columns ID, clinic, status, survt, prison, and dose.
Methadone clinic dataset loaded into Excel, with clinic, status, survival time, prison record, and dose columns.

This is a very good teaching dataset because it supports three natural research questions:

  1. Do the two clinics differ in retention over time?
  2. Is any observed difference statistically significant?
  3. After adjustment for prison record and methadone dose, is clinic still associated with time to departure?

Why survival analysis is needed here

Suppose we compared average follow-up time between the clinics and stopped there. That would ignore censoring. Some patients were still in treatment when observation ended, so their final event time is unknown. We only know that their retention time is at least as long as the observed follow-up.

That is why survival methods are the right choice. They allow us to use all available follow-up information without pretending that censored observations are complete events.

Starting the analysis in BESH Stat NG

BESH Stat NG places survival analysis directly inside Excel, which makes it approachable for users who want a point-and-click workflow without leaving the spreadsheet environment.

BESH Stat NG Excel ribbon menu expanded to show Kaplan-Meier Plot, Logrank Test, and Cox Regression.
BESH Stat NG ribbon menu showing the Survival Analysis tools available in Excel.

From the ribbon, the relevant items for this tutorial are:

  • Kaplan-Meier Plot
  • Logrank Test
  • Cox Regression

This is already a nice design choice for teaching, because the workflow follows the way many analysts actually think:
start with descriptive survival curves, then test group differences, then fit an adjusted model.

Kaplan–Meier analysis by clinic

The first step is to compare retention in clinic 1 and clinic 2 visually.

In the Kaplan–Meier dialog, set:

  • Survival Times = survt
  • Censorship identifier = status
  • Group ID = clinic
Kaplan-Meier dialog in BESH Stat NG with survival time, censoring variable, and clinic group selected.
Kaplan–Meier input dialog with survival time, censoring indicator, and clinic group selected.
Kaplan-Meier options dialog showing detailed output and confidence interval settings.
Kaplan–Meier options used for this example, including detailed output and 95% confidence intervals.

After running the procedure, BESH Stat NG produces both a plot and a detailed tabular output.

Kaplan-Meier plot comparing clinic 1 and clinic 2 retention over time in days.
Kaplan–Meier survival curves for clinic 1 and clinic 2, showing better retention in clinic 2.
Detailed survival curve table with time, group, at-risk counts, survival probability, standard error, and confidence limits.
Detailed Kaplan–Meier tabular output written by BESH Stat NG.

Interpreting the Kaplan–Meier plot

This figure tells an important story immediately.

The red curve for clinic 1 drops more quickly over time, while the blue curve for clinic 2 stays consistently higher. Because the event is departure from treatment, a higher survival curve means better retention.

So even before formal testing, the visual impression is clear: patients in clinic 2 appear to remain in treatment longer than patients in clinic 1.

A few reminders for readers who are new to survival plots:

  • each downward step marks one or more events
  • short tick marks indicate censored observations
  • the dashed lines show the 95% confidence interval around each curve

That combination of curve, censor marks, and confidence limits is one reason Kaplan–Meier plots are so useful in applied medical research.

Testing the clinic difference with the log-rank test

A visual difference is useful, but it is not enough by itself. We also want a formal hypothesis test.

In the Logrank Test dialog, I used the same three fields:

  • Survival Times = survt
  • Censorship identifier = status
  • Group ID = clinic
Log-rank test dialog with time, censoring, and group fields completed.
Log-rank test input dialog for comparing clinic-specific survival curves.

For weighting, I used the standard log-rank option.

Log-rank weighting options including log-rank, Gehan-Breslow, Tarone-Ware, Peto-Peto, and Modified Peto-Peto.
Log-rank weighting options, with the standard log-rank test selected.

The output shows:

  • Chi-square = 27.8927
  • Two-sided p-value = 1.28234E-07
Excel output showing median survival time by clinic and log-rank test results.
Median survival time and log-rank test results for the two clinics.

This is strong evidence that the retention curves differ between the two clinics.

In practical language: the observed difference in patient retention is far too large to be explained by random sampling variation alone.

BESH Stat NG also reports an approximate hazard ratio for the two-group comparison. In this output, the hazard ratio for clinic 1 versus clinic 2 is about 2.83, with an approximate 95% confidence interval from 2.04 to 3.93. Since the event is departure from clinic, that means patients in clinic 1 had a substantially higher hazard of leaving treatment than patients in clinic 2.

Median survival time

The median survival table adds another intuitive summary.

For clinic 1, the estimated median retention time is 428 days, with a 95% confidence interval of 341 to 512 days.

For clinic 2, the median is not reached during the observed follow-up period. That is actually a useful result, not an error. It means the clinic 2 survival curve did not fall below 0.5 during follow-up, so more than half of those patients were still retained long enough that the median could not be observed directly from the available time window.

This is a nice teaching point: in survival analysis, “median not reached” can itself be informative and often indicates relatively favorable retention.

Reproducing the same analysis with survival UDFs

One of the strongest parts of BESH Stat NG is that the same analysis can also be performed with worksheet formulas. That matters for reproducibility, teaching, and building reusable Excel templates.

Kaplan–Meier table UDF

Excel function wizard for BESH.SURV.KM_TABLE.
Using BESH.SURV.KM_TABLE to generate a Kaplan–Meier table directly in the worksheet.
=BESH.SURV.KM_TABLE(D:D,C:C,B:B)

This returns a tabular Kaplan–Meier curve by group, including:

  • group
  • time
  • at risk
  • survival probability
  • standard error
  • lower 95% CI
  • upper 95% CI

Median survival time UDF

Excel function wizard for BESH.SURV.MEDIAN_CI.
Using BESH.SURV.MEDIAN_CI to return median survival times and confidence intervals.

=BESH.SURV.MEDIAN_CI(D:D,C:C,B:B)

This is useful when you want a compact table of median survival times and confidence intervals by group.

Log-rank statistic UDF

Excel function wizard for BESH.SURV.LOGRANK_STAT.
Using BESH.SURV.LOGRANK_STAT to return the log-rank chi-square statistic in a cell.

=BESH.SURV.LOGRANK_STAT(D:D,C:C,B:B,,"logrank")

Log-rank p-value UDF

Excel function wizard for BESH.SURV.LOGRANK_P
Using BESH.SURV.LOGRANK_P to return the log-rank p-value in a cell.

=BESH.SURV.LOGRANK_P(D:D,C:C,B:B,,"logrank")

This formula-based workflow is especially attractive when you want your spreadsheet to document both the input data and the statistical logic in visible cells rather than hiding everything inside a one-time dialog run.

Why move from Kaplan–Meier to Cox regression?

Kaplan–Meier curves and the log-rank test are excellent for comparing groups, but they do not adjust for multiple predictors at the same time.

In this dataset, clinic is clearly important, but it may not be the whole story. We also have:

  • prison record
  • methadone dose

A Cox proportional hazards model lets us estimate how each predictor is associated with the hazard of departure while holding the others constant.

That is the natural next step in a medical-research workflow.

Cox proportional hazards regression in BESH Stat NG

In the Cox regression dialog, I set:

  • Time = survt
  • Censoring = status
  • Predictors = clinic, prison, dose
Cox regression variable selection dialog in BESH Stat NG.
Cox regression variable selection in BESH Stat NG.

Then in the model-building tab I added all three effects.

Cox regression build model dialog showing selected effects.
Cox regression model-building tab with clinic, prison, and dose effects added.

In the options tab I used Efron tie handling.

Cox regression options including tie handling and residual diagnostics settings.
Cox regression options tab with Efron tie handling selected.

After fitting the model, BESH Stat NG returned the following main results:

Cox regression results table with coefficients, standard errors, p-values, hazard ratios, and confidence intervals.
Cox proportional hazards regression results for clinic, prison record, and methadone dose.

Interpreting the Cox model

The key idea in Cox regression is the hazard ratio (HR).

Because the event here is departure from clinic, the interpretation is:

  • HR > 1 means higher hazard of leaving treatment sooner
  • HR < 1 means lower hazard of leaving treatment sooner, which corresponds to better retention

Clinic

The coefficient for clinic is negative, and the hazard ratio is about 0.364.

Since clinic is coded 1 and 2 in this example, moving from clinic 1 to clinic 2 is associated with a substantially lower hazard of departure. In practical terms, after adjusting for prison record and dose, clinic 2 still shows better retention than clinic 1.

This result is also highly statistically significant.

Prison record

The hazard ratio for prison is about 1.386, with a p-value just above 0.05.

That suggests patients with a prison record may have a higher hazard of departure, but in this particular model the evidence is borderline rather than definitive. This is a good example of why it is better to say “associated with” rather than to overstate the conclusion.

Methadone dose

The coefficient for dose is negative, and the hazard ratio is about 0.965 per mg/day.

So higher methadone dose is associated with a lower hazard of departure, meaning better retention. In this model, dose is strongly significant.

For applied readers, the take-home message is simple: higher dose appears to be associated with longer retention in treatment.

Global model fit

The model-level tests are also strongly significant, which tells us that the predictors as a set contribute useful information about time to departure.

Cox regression with UDFs

BESH Stat NG also supports a UDF workflow for Cox models.

Excel worksheet showing BESH Stat NG Cox UDF formulas and outputs.
Worksheet-based Cox model workflow using BESH.SURV.COX_FIT, BESH.SURV.COX_SUMMARY, and BESH.SURV.COX_TESTS.

In the example worksheet, the model is fitted once and then reused:

=BESH.SURV.COX_FIT(C2:C239,B2:B239,D2:F239)

Then the fitted object is summarized with:
=BESH.SURV.COX_SUMMARY(H2)
And model-level tests can be returned with:
=BESH.SURV.COX_TESTS(H2)
This is a very practical design. You can fit the model once, keep the returned handle in a cell, and then pull different outputs into separate tables. For teaching and report-building, that is much cleaner than repeatedly refitting the same model.

BESH Stat NG also includes additional Cox-related capabilities such as optional robust variance, residual output, proportional-hazards testing, and baseline or adjusted survival outputs, but for many junior researchers the most important first step is learning how to interpret the main coefficient table correctly.

Practical takeaways

This example shows a complete survival-analysis workflow in Excel:

  1. Use Kaplan–Meier curves to visualize retention over time.
  2. Use the log-rank test to compare groups formally.
  3. Use Cox regression to adjust for multiple predictors.
  4. Use UDFs when you want reproducible worksheet-based analysis.

For this methadone clinic dataset, the practical conclusions are straightforward:

  • retention differs clearly between the two clinics
  • clinic 2 shows better retention than clinic 1
  • higher methadone dose is associated with better retention
  • prison record may matter, but the adjusted evidence is weaker in this model than for clinic or dose

Final thoughts

For medical researchers and students, survival analysis can look intimidating at first. But this example shows that the workflow becomes much more manageable when the software makes both the inputs and the outputs transparent.

That is what I like about BESH Stat NG in this context: it supports both a dialog-driven workflow for interactive analysis and a UDF workflow for reproducible spreadsheet models. That combination is especially useful in teaching, exploratory work, and practical applied research.

To try this example yourself:

 

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.