ROC analysis in Excel with BESHStatNG

Receiver operating characteristic (ROC) analysis is one of the most useful tools for evaluating a diagnostic marker or binary classifier. In this tutorial, we show how to run ROC analysis in Excel with BESH Stat NG using both the GUI workflow and the UDF workflow.

The example uses a real diagnostic dataset based on the Breast Cancer Wisconsin (Diagnostic) data. We treat malignant cases as the positive class and use worst concave points as the default marker.

This tutorial is designed to mirror the existing BESH Stat NG tutorial style:

  • a real-world dataset
  • a GUI workflow
  • a worksheet/UDF workflow
  • a short ROC background section
  • practical interpretation of the output

Download the example workbook

The workbook contains these sheets:

  • README — short overview and expected results
  • Data — the real-world dataset used in the tutorial
  • UDF_Workflow — worksheet formulas using BESH Stat NG UDFs
  • Reference_ROC — workbook-native ROC preview and threshold interpretation
  • GUI_output — example GUI output
  • Sources — dataset and documentation links

Example data

The dataset contains 569 fine-needle aspirate breast mass cases. For the UDF workflow, the outcome is recoded as:

  • 1 = malignant
  • 0 = benign

The default ROC marker in this tutorial is:

  • Worst concave points

This is a good teaching example because it has strong discrimination between malignant and benign cases, giving a clear ROC curve and a high area under the curve (AUC).

What ROC analysis tells us

ROC analysis studies how well a continuous marker separates two groups across all possible decision thresholds.

For any cut-off value, we can compute:

  • Sensitivity = proportion of positives correctly identified
  • Specificity = proportion of negatives correctly identified

The ROC curve plots:

  • X-axis: 1 − Specificity
  • Y-axis: Sensitivity

A marker with no discrimination gives a curve close to the diagonal line. A strong marker rises quickly toward the upper-left corner.

The area under the ROC curve (AUC) summarizes overall discrimination:

  • AUC = 0.5 means no discrimination
  • AUC = 1.0 means perfect discrimination

In BESH Stat NG, ROC output reports:

  • Wilcoxon AUC
  • DeLong standard error and confidence interval
  • Hanley–McNeil standard error and confidence interval
  • two-sided p-value for testing AUC = 0.5
  • a full cut-off table with sensitivity and specificity

GUI workflow

Step 1. Open the ROC tool

Go to: BESH Stat NG → Analyse → Graphics → ROC Curve

Excel ribbon screenshot showing BESH Stat NG menu path Analyse, Graphics, ROC Curve for ROC analysis in Excel.
BESH Stat NG ribbon path for ROC analysis in Excel: Analyse → Graphics → ROC Curve.

Step 2. Select the input data

In this example, the GUI workflow uses:

  • Group by ID
  • Group ID: Data!$C$5:$C$574
  • Data: Data!$D$5:$D$574

Here:

  • column C contains the diagnosis labels
  • column D contains the marker values
BESH Stat NG ROC Curve input dialog in Excel with Group by ID selected and data ranges for diagnosis and marker values.
ROC Curve input dialog in BESH Stat NG with Group by ID selected and the diagnosis and marker ranges assigned.

Step 3. Choose the output location

Select New Worksheet and click Compute.

BESH Stat NG returns:

  • the numerical ROC summary
  • the cut-off table
  • the ROC chart
BESH Stat NG ROC analysis results in Excel showing AUC, DeLong and Hanley-McNeil confidence intervals, p-value, sensitivity, specificity, and cut-off values.
ROC analysis output in Excel with AUC, confidence intervals, standard errors, p-value, and the cut-off table for sensitivity and specificity.

 

ROC curve in Excel generated with BESH Stat NG for a breast cancer diagnostic marker, showing high sensitivity and specificity with AUC around 0.967.
ROC curve produced in Excel for the breast cancer diagnostic example, showing excellent discrimination with AUC about 0.967.

GUI results for this example

Using worst concave points as the marker, BESH Stat NG returns:

  • Wilcoxon AUC: 0.966703663
  • DeLong 95% CI: 0.9521635 to 0.9812438
  • DeLong standard error: 0.007418605
  • Hanley–McNeil 95% CI: 0.9521618 to 0.9812455
  • Hanley–McNeil standard error: 0.007419458
  • Two-sided p-value (AUC different from 0.5): 1.85954E-77

This is an excellent ROC result. The AUC is close to 0.97, which means malignant cases tend to receive much higher marker values than benign cases.

The curve also hugs the upper-left region of the plot, which is exactly what we want to see from a strong diagnostic marker.

UDF workflow in Excel

The GUI is convenient for one-off analysis, but the UDF workflow is better when you want a reproducible workbook, a template, or a fully formula-driven tutorial.

BESH Stat NG provides two ROC worksheet functions:

  • BESH.PLOT.ROC_STATS(...) — numerical ROC summary
  • BESH.PLOT.ROC_POINTS(...) — chart-ready ROC points

ROC summary formula

Paste this formula into a blank cell:

=BESH.PLOT.ROC_STATS(Data!$D$6:$D$574,Data!$B$6:$B$574,1,"higher",0.05)

This returns a spill range with the main numerical results, including:

  • AUC
  • standard errors
  • confidence intervals
  • p-value

ROC points formula

Paste this formula into another blank cell:

=BESH.PLOT.ROC_POINTS(Data!$D$6:$D$574,Data!$B$6:$B$574,1,"higher",0.05)

This returns a spill table with ROC coordinates and thresholds, including:

  • Threshold
  • Sensitivity
  • Specificity
  • FalsePositiveRate
  • TruePositiveRate

Meaning of the arguments

In both formulas:

  • the first argument is the marker
  • the second argument is the status
  • the third argument is the positive class
  • the fourth argument is the direction
  • the fifth argument is the alpha level

So in this tutorial:

  • Data!$D$6:$D$574 = marker values
  • Data!$B$6:$B$574 = 1/0 status values
  • 1 = malignant is the positive class
  • "higher" = higher marker values indicate malignant cases
  • 0.05 = 95% confidence intervals

Optional text-label version

If you prefer to use the diagnosis labels directly instead of the recoded 1/0 column, you can use:

=BESH.PLOT.ROC_STATS(Data!$D$6:$D$574,Data!$C$6:$C$574,"Malignant","higher",0.05)

and

=BESH.PLOT.ROC_POINTS(Data!$D$6:$D$574,Data!$C$6:$C$574,"Malignant","higher",0.05)

Building the ROC chart from the spill range

After entering BESH.PLOT.ROC_POINTS(...):

  1. select the spilled output
  2. insert an XY Scatter with Straight Lines
  3. use FalsePositiveRate for the X-axis
  4. use TruePositiveRate for the Y-axis

This gives a worksheet-native ROC chart that updates automatically when the source data changes.

Validation and reference checking

The ROC workflow shown in this tutorial is designed to be transparent and reproducible. In BESH Stat NG, worksheet-based workflows can be checked directly against the GUI output, the downloadable example workbook, and the project’s broader validation materials. For more detail on how results are checked, including reference datasets, automated tests, and comparison workflows, see the Validation page.

Interpreting the cut-off table

A ROC curve is useful because it shows the trade-off between sensitivity and specificity across all thresholds.

In the example workbook, the reference sheet also identifies the best Youden threshold, which is the cut-off that maximizes:

Sensitivity + Specificity − 1

For this dataset, the workbook gives a Youden-optimal threshold of about:

  • Threshold: 0.1358
  • Sensitivity: 0.8679
  • Specificity: 0.9440

This is a strong practical cut-off for teaching purposes because it gives both high sensitivity and high specificity.

However, in real clinical use, the best threshold may depend on the cost of false positives and false negatives. For example:

  • if missing a malignant case is very costly, you may prefer a lower threshold and higher sensitivity
  • if unnecessary follow-up procedures are very costly, you may prefer a higher threshold and higher specificity

So the Youden cut-off is a useful summary, but it is not always the final decision threshold.

Practical interpretation of this example

This example shows that worst concave points is a very strong discriminator between malignant and benign cases in this dataset.

The evidence is consistent across the output:

  • the ROC curve rises steeply
  • the AUC is very high
  • both standard error methods give almost identical results
  • the p-value is extremely small
  • the threshold table shows that high sensitivity can be achieved without sacrificing too much specificity

In plain language, that means the marker does a very good job of ranking malignant cases above benign cases.

Why use both GUI and UDF workflows?

The two workflows complement each other.

GUI workflow advantages

  • quick exploratory analysis
  • immediate chart output
  • easy for first-time users
  • useful for ad hoc reporting

UDF workflow advantages

  • reproducible formula-based analysis
  • easier to audit and maintain
  • ideal for templates and tutorials
  • easy to connect with other workbook calculations
  • dynamic charting from spill ranges

A good Excel tutorial should usually show both.

Notes and tips

  • The marker column must be numeric.
  • The status column must contain exactly two groups.
  • For UDF formulas, it is usually safest to exclude the header row.
  • If smaller values indicate the positive class, change "higher" to "lower".
  • If a BESH UDF returns #NAME?, make sure the add-in is loaded and recalculate the workbook.
  • If you want to compare markers, duplicate the UDF workflow and replace the marker range with another variable.

In the example workbook, you can also try:

  • Mean radius
  • Worst area

This is a simple way to compare which single marker gives the strongest ROC performance.

Suggested structure for your own ROC workbook templates

A reusable ROC workbook in Excel can follow this pattern:

  1. Data sheet
    Store the raw outcome and marker variables.
  2. UDF workflow sheet
    Use BESH.PLOT.ROC_STATS(...) and BESH.PLOT.ROC_POINTS(...).
  3. Reference / interpretation sheet
    Add threshold summaries, Youden index, and comments.
  4. Chart sheet or dashboard
    Show the ROC curve and key metrics.

This structure makes the workbook easy to teach, reuse, and adapt to other biomarker datasets.

Summary

BESH Stat NG makes ROC analysis in Excel straightforward in two different ways.

The GUI workflow is fast and visual.
The UDF workflow is reproducible and workbook-friendly.

In this tutorial example, the marker worst concave points gives an AUC of 0.9667, which indicates excellent discrimination between malignant and benign cases.

That makes this dataset a strong teaching example for:

  • ROC basics
  • AUC interpretation
  • threshold selection
  • GUI workflow in BESH Stat NG
  • reproducible UDF-based Excel analysis

Frequently asked questions

What is ROC analysis in Excel used for?

ROC analysis in Excel is used to evaluate how well a continuous marker or prediction score separates two groups, usually a positive and a negative class. It helps assess diagnostic accuracy across all possible cut-off values.

What does AUC mean in ROC analysis?

AUC, or area under the curve, summarizes overall discrimination. An AUC of 0.5 means no discrimination, while an AUC closer to 1.0 indicates excellent separation between the two groups.

What data do I need for ROC analysis in Excel?

You need a binary outcome variable and a numeric marker or prediction score. In this tutorial, the outcome is malignant versus benign, and the marker is worst concave points.

Can BESH Stat NG perform ROC analysis in Excel with formulas?

Yes. BESH Stat NG supports ROC analysis through both the graphical interface and worksheet UDFs. This makes it possible to build reproducible, formula-based Excel workflows for ROC curves, AUC estimation, and threshold analysis.

How do I choose the best ROC cut-off?

A common approach is to use the Youden index, which maximizes sensitivity plus specificity minus one. However, in practice, the best cut-off also depends on the relative cost of false positives and false negatives.

See also