Survival analysis in Excel

Survival analysis is one of the most common workflows in clinical and biomedical research. In BESHStatNG, you can perform the core survival methods directly in Excel, including Kaplan–Meier analysis, the log-rank test, and Cox proportional hazards regression.

This tutorial uses a real clinical study example based on the Veterans’ Administration Lung Cancer Trial dataset. The goal is to show a complete Excel-based workflow: preparing the data, running the analysis with the BESHStatNG GUI, reviewing the results, and comparing them with reference results stored in the workbook.

In this tutorial, you will learn how to:

  • create Kaplan–Meier survival estimates by treatment group
  • test whether survival differs between groups using the log-rank test
  • fit a Cox proportional hazards model with multiple predictors
  • interpret hazard ratios and p-values in a practical way
  • compare BESHStatNG results with reference outputs in the workbook

Download the tutorial workbook

Download the workbook used in this tutorial: [Survival Analysis in Excel workbook]

The workbook is designed so you can follow the tutorial step by step and then reuse the same structure for your own survival analysis projects in Excel.

This workbook contains:

  • the raw clinical study dataset
  • a cleaned analysis sheet prepared for BESHStatNG
  • a worksheet with BESHStatNG UDF formulas
  • reference Kaplan–Meier and log-rank results
  • reference Cox regression results
  • GUI output sheets and screenshots used in this tutorial

The example dataset

This tutorial uses the Veterans’ Administration Lung Cancer Trial dataset, a well-known clinical survival dataset with follow-up time, censoring status, treatment group, and several patient-level predictors.

For the tutorial workbook, the main analysis sheet uses these variables:

  • trt_label — treatment arm label
  • time — survival time
  • status_event — event indicator, where 1 = event and 0 = censored
  • trt_test — coded treatment variable
  • karno — Karnofsky performance score
  • age — age in years
  • prior_yes — prior therapy indicator
  • diagtime — time from diagnosis
  • celltype — tumor cell type

The workbook contains these main sheets:

  • Analysis_Data — analysis-ready data for BESHStatNG
  • BESH_UDFs — formula-driven survival analysis with BESHStatNG UDFs
  • Ref_KM_Logrank — reference Kaplan–Meier and log-rank results
  • Ref_Cox — reference Cox model results
  • GUI_KM_logrank — GUI-based Kaplan–Meier and log-rank results
  • GUI_COX — GUI-based Cox regression setup and outputs

Step-by-step walkthrough

This tutorial follows a simple workflow:

  1. start from the prepared Analysis_Data worksheet
  2. run Kaplan–Meier Plot
  3. review the median survival and log-rank results
  4. run Cox Regression
  5. interpret the treatment effect and the covariates
BESH Stat NG ribbon with the Survival Analysis menu, including Kaplan–Meier Plot, Logrank Test, and Cox Regression.

Kaplan–Meier analysis in Excel

The first step is to estimate the survival curves for the two treatment groups (see documentation).

Open:

BESH Stat NG → Analyse → Survival Analysis → Kaplan-Meier Plot

In the workbook, use:

  • Survival Times: Analysis_Data!B:B
  • Censorship indicator: Analysis_Data!C:C
  • Group ID: Analysis_Data!A:A

Set the output to a new worksheet and run the analysis.

Kaplan–Meier input dialog in BESHStatNG using the Analysis_Data worksheet.

The grouped Kaplan–Meier analysis gives median survival time estimates for each treatment arm.

From the workbook results:

  • Standard treatment: median survival = 103 days
  • 95% CI: 56 to 126 days
  • Test treatment: median survival = 52 days
  • 95% CI: 44 to 90 days

At first glance, this looks like a noticeable difference between groups. However, median survival values alone do not tell the whole story. The more important question is whether the overall survival experience differs significantly across the full follow-up period.

Log-rank test

To compare the survival curves formally, review the log-rank test output produced by BESHStatNG (see documentation).

From the GUI results in the workbook:

  • Log-rank chi-square: 0.0082
  • Two-sided p-value: 0.9277
  • Hazard ratio (Standard vs Test): 0.9845

This means the Kaplan–Meier curves are not significantly different by treatment group in this example.

That is an important teaching point: even though the median survival estimates are different numerically, the overall survival comparison is not statistically significant based on the log-rank test.

The workbook also includes a separate output labelled Test for Equality of Median Survival Times, with p = 0.0802. That result addresses the medians specifically, whereas the log-rank test compares the full survival curves over time. In survival analysis, the log-rank test is usually the primary overall group comparison.

Kaplan–Meier and log-rank results for the Standard and Test treatment groups.

In this dataset, the treatment groups show different median survival estimates, but the log-rank test indicates no evidence of a statistically significant difference in the full survival curves. This is a good example of why survival analysis should not rely on medians alone.

Cox proportional hazards regression

The next step is to move from an unadjusted group comparison to a multivariable survival model (see documentation).

Open:

BESH Stat NG → Analyse → Survival Analysis → Cox Regression

Use the following settings:

  • Time: time
  • Censoring: status_event
  • Predictor variables: trt_test, karno, age, prior_yes

Then move to the Build Model tab and include the same predictors in the selected effects list.

Cox regression input screen with time, censoring, and predictor variables selected.

 

Cox regression build-model screen with treatment, Karnofsky score, age, and prior therapy included in the selected effects.

For the tutorial example, the options used were:

  • Ties handling: Breslow
  • alpha: 0.050
  • no robust variance
  • no proportional hazards test enabled in this run
Cox regression options screen showing Breslow ties handling and the alpha level used in this tutorial.

Interpreting the Cox model

The workbook reference results for the model

time ~ trt_test + karno + age + prior_yes

show the following:

Treatment group

  • HR = 1.209
  • 95% CI = 0.839 to 1.741
  • p = 0.308

After adjustment for the other variables, treatment group is not a statistically significant predictor of survival in this model.

Karnofsky score

  • HR = 0.967 per 1-point increase
  • 95% CI = 0.957 to 0.977
  • p < 0.000000001

Karnofsky performance score is the strongest predictor in the model. Higher Karnofsky score is associated with lower hazard, meaning better expected survival.

Age

  • HR = 0.996
  • 95% CI = 0.978 to 1.014
  • p = 0.669

Age is not statistically significant in this model.

Prior therapy

  • HR = 0.940
  • 95% CI = 0.632 to 1.399
  • p = 0.762

Prior therapy is also not statistically significant here.

The Cox model shows that, in this dataset, baseline functional status matters much more than treatment assignment. The clearest signal comes from the Karnofsky score: patients with higher performance status have substantially lower hazard. Treatment group, age, and prior therapy do not show statistically significant effects in this particular model.

You can download the full GUI based result sheet here – cox_fit_results.

Practical interpretation

This tutorial illustrates a common survival-analysis pattern:

  • a simple group comparison is useful as a first step
  • Kaplan–Meier curves help visualize survival over time
  • the log-rank test compares the groups overall
  • the Cox model shows whether treatment still matters after adjusting for covariates

In this example, the survival workflow suggests:

  • the two treatment groups do not show a significant overall survival difference
  • patient condition at baseline, measured by Karnofsky score, is a much stronger survival predictor
  • multivariable modeling helps separate treatment effects from prognostic effects

This is exactly why Kaplan–Meier and Cox regression are often used together in clinical survival analysis.

Results match reference outputs

The tutorial workbook includes reference sheets so you can compare BESHStatNG output directly with stored benchmark results.

Use:

  • Ref_KM_Logrank for Kaplan–Meier medians and log-rank results
  • Ref_Cox for Cox model coefficients, hazard ratios, confidence intervals, and p-values
  • BESH_UDFs for formula-based BESHStatNG survival functions

When the workbook is opened with BESHStatNG loaded, the UDF-based results should agree with the reference values up to normal rounding differences. For the UDFs documentation see.

Key reference values in this tutorial are:

  • Log-rank chi-square: 0.0082
  • Log-rank p-value: 0.9277
  • Cox HR for treatment: 1.209
  • Cox HR for Karnofsky score: 0.967

This makes the workbook useful not only as a tutorial but also as a validation example. You can compare GUI-based output, UDF-based output, and reference results in one place.

R reference

Use the script below to reproduce the main survival analysis results from the tutorial workbook in R. It fits Kaplan–Meier curves by treatment group, performs the standard log-rank test, and estimates the Cox proportional hazards model with Breslow ties.

library(readxl)
library(survival)

dat <- read_excel(
  "BESHStatNG_survival_tutorial_veteran.xlsx",
  sheet = "Analysis_Data"
)

dat <- as.data.frame(dat)
dat$trt_label <- factor(dat$trt_label, levels = c("Standard", "Test"))
dat$status_event <- as.integer(dat$status_event)
dat$trt_test <- as.integer(dat$trt_test)
dat$prior_yes <- as.integer(dat$prior_yes)

survfit(Surv(time, status_event) ~ trt_label, data = dat)
survdiff(Surv(time, status_event) ~ trt_label, data = dat, rho = 0)
summary(coxph(Surv(time, status_event) ~ trt_test + karno + age + prior_yes,
              data = dat,
              ties = "breslow"))

What to do next

Once you have completed this tutorial, the next useful BESHStatNG workflows are:

If you want to adapt this workbook for your own data, use the same structure:

  • one clean analysis sheet
  • one worksheet for BESHStatNG UDFs
  • one sheet for saved GUI output
  • one sheet for reference or benchmark results

That approach makes it much easier to build reusable and well-documented Excel analysis templates.