Sample size planning in Excel with BESHStatNG

Sample size planning is one of the most important steps in study design. Before collecting data, you need to know how many subjects, pairs, or events are required to detect a clinically meaningful effect with acceptable power.

BESHStatNG makes it possible to perform these planning workflows directly in Excel. The sample size section covers not only classical continuous and proportion-based designs, but also survival designs, reliability studies, and agreement studies.

This tutorial shows how to use the Sample Size section of the BESH Stat NG ribbon together with the companion workbook to plan studies for:

The goal is not only to show where to click, but also to explain which planning tool matches which study question, how to interpret the returned numbers, and how to reuse the workbook as a formula-based planning template.

In this tutorial, you will learn how to

  • choose the correct sample size tool for a given study design
  • enter planning assumptions in the BESHStatNG dialogs
  • interpret required numbers of subjects, pairs, and events
  • understand the difference between superiority, non-inferiority, and equivalence planning
  • use the workbook to keep planning assumptions and UDF-based calculations together in one Excel file

Download the tutorial workbook

This tutorial uses one workbook:

[Download the sample size planning workbook]

The workbook contains:

  • Overview — what the workbook includes
  • Scenario_Inputs — editable planning assumptions
  • BESH_UDFs — formula-driven sample size calculations using the BESH sample size functions
  • Function_Guide — quick function signatures and examples
  • Reference_Notes — page-building notes and reminders
  • GUI_Screenshots — optional space for screenshot assembly

The workbook is designed so you can follow the tutorial step by step, reproduce the dialog calculations in Excel, and later adapt the same structure to your own studies.

Start from the BESH Stat NG ribbon

All methods used in this tutorial are available from the Sample Size section of the BESH Stat NG ribbon.

 

BESH Stat NG ribbon in Excel showing the Sample Size menu with paired t-test, unpaired t-test, single proportion, independent proportions, log-rank test, Cox regression, intraclass correlation, and Bland–Altman agreement options.
Start the tutorial from the Sample Size section of the BESH Stat NG ribbon.

Start by opening:

BESH Stat NG → Analyse → Sample Size

From there, select the planning tool that matches your study design.

Part I — Continuous outcomes

Paired t-test

Use the paired t-test sample size tool when each subject contributes a matched pair of measurements, such as:

  • before/after measurements
  • left/right side comparisons
  • matched laboratory measurements
  • repeated measurements on the same subject

In the tutorial example, the dialog uses:

  • Mean Difference = 2
  • Standard Deviation = 5
  • Alpha = 0.050
  • Beta = 0.200
BESHStatNG Sample Size Paired T-test dialog with mean difference 2, standard deviation 5, alpha 0.050, beta 0.200, and estimated number of pairs 52.
Paired t-test planning example requiring 52 pairs.

From the results:

  • Estimated Number of Pairs = 52

How to interpret it

This means the study needs 52 paired observations to detect a mean paired difference of 2 units with 80% power at a 5% significance level.

The key point is that this calculation is based on the standard deviation of the paired differences, not the standard deviation of the raw measurements. In paired designs, the relevant source of variability is the within-pair difference.

Unpaired t-test

Use the unpaired t-test tool when you want to compare the means of two independent groups.

BESHStatNG supports three planning modes:

  • superiority
  • non-inferiority
  • equivalence

Unpaired t-test — superiority

This design is appropriate when the goal is to show that the group means differ.

In the tutorial example, the inputs are:

  • Mean Difference = 2
  • Standard Deviation = 5
  • Ratio of control to experimental subjects = 1
  • Alpha = 0.050
  • Beta = 0.200
  • Hypothesis Type = Superiority
BESHStatNG Unpaired T-test sample size dialog in superiority mode with mean difference 2, standard deviation 5, equal allocation, and estimated 100 subjects per group.
Two-group superiority design for continuous outcomes.

From the results:

  • Estimated Number of Control subjects = 100
  • Estimated Number of Experimental subjects = 100

Interpretation

A balanced two-group superiority study with these assumptions requires 100 subjects per group.

Unpaired t-test — non-inferiority

Use non-inferiority planning when the aim is to show that the experimental group is not worse than the control group by more than a predefined margin.

In the tutorial example, the inputs are:

  • Expected Mean Difference = 1
  • Non-inferiority Margin = 2
  • Standard Deviation = 4
  • Ratio = 1
  • One-sided Alpha = 0.025
  • Beta = 0.200
  • Hypothesis Type = Noninferiority
BESHStatNG Unpaired T-test sample size dialog in noninferiority mode with expected mean difference 1, noninferiority margin 2, standard deviation 4, and estimated 29 subjects per group.
Non-inferiority sample size planning for two independent means.

From the results:

  • Estimated Number of Control subjects = 29
  • Estimated Number of Experimental subjects = 29

Interpretation

Under these assumptions, a non-inferiority design requires 29 subjects per group. The margin defines the largest clinically acceptable loss that would still support non-inferiority.

Unpaired t-test — equivalence

Use equivalence planning when the goal is to show that the true mean difference lies within a predefined acceptable interval.

In the tutorial example, the inputs are:

  • Expected Mean Difference = 1
  • Equivalence Margin = 2
  • Standard Deviation = 4
  • Ratio = 1
  • One-sided Alpha = 0.025
  • Beta = 0.200
  • Hypothesis Type = Equivalence
BESHStatNG Unpaired T-test sample size dialog in equivalence mode with expected mean difference 1, equivalence margin 2, standard deviation 4, and estimated 253 subjects per group.
Equivalence planning often requires the largest sample size.

From the results:

  • Lower-bound requirement: Controls = 29, Experimental = 29
  • Upper-bound requirement: Controls = 253, Experimental = 253
  • Driving bound = Upper bound
  • Estimated Number of Controls = 253
  • Estimated Number of Experimental subjects = 253

Interpretation

Equivalence planning is usually more demanding than superiority or non-inferiority. In this example, the upper equivalence bound drives the sample size, so the study needs 253 subjects per group.

Part II — Proportions

Single proportion

Use this procedure when the outcome is binary and the study question compares an anticipated proportion against a null hypothesis proportion.

In the tutorial example, the inputs are:

  • Proportion = 0.60
  • Null Hypothesis Proportion = 0.50
  • Alpha = 0.050
  • Beta = 0.200
BESHStatNG Sample Size Single Proportion dialog with proportion 0.6, null hypothesis proportion 0.5, alpha 0.050, beta 0.200, and estimated 189 subjects.
One-sample proportion planning example.

From the results:

  • Estimated Number of Subjects = 189

Interpretation

A one-sample proportion study with these assumptions requires 189 subjects. This is a common design for prevalence studies, quality benchmarks, and single-arm studies with binary outcomes.

Independent proportions

Use this procedure when you are comparing proportions between two independent groups, such as response rates, event rates, or complication rates.

BESHStatNG supports:

  • superiority
  • non-inferiority
  • equivalence

The output is especially useful because it reports different recommendations for the uncorrected chi-square approach and for the corrected chi-square / Fisher-style approach.

Independent proportions — superiority

Inputs used in the tutorial example:

  • Control Group Proportion = 0.30
  • Experimental Group Proportion = 0.50
  • Ratio = 1
  • Alpha = 0.050
  • Beta = 0.200
  • Hypothesis Type = Superiority
BESHStatNG Independent Proportions dialog in superiority mode with control proportion 0.3, experimental proportion 0.5, and estimated 93 per group uncorrected or 103 per group corrected.
Superiority planning for two independent proportions.

From the results:

  • Uncorrected chi-square: Controls = 93, Experimental = 93
  • Corrected chi-square / Fisher exact: Controls = 103, Experimental = 103

Interpretation

For practical planning, the corrected or Fisher-style recommendation is usually the safer choice. In this example, that means planning for 103 subjects per group.

Independent proportions — non-inferiority

Inputs used in the tutorial example:

  • Control Group Proportion = 0.30
  • Experimental Group Proportion = 0.50
  • Non-inferiority Margin = 0.10
  • Ratio = 1
  • One-sided Alpha = 0.025
  • Beta = 0.200
  • Hypothesis Type = Noninferiority
BESHStatNG Independent Proportions dialog in noninferiority mode with control proportion 0.3, experimental proportion 0.5, margin 0.1, and estimated 35 or 42 per group depending on method.
Non-inferiority planning for two independent proportions.

From the results:

  • Uncorrected chi-square: Controls = 35, Experimental = 35
  • Corrected chi-square / Fisher exact: Controls = 42, Experimental = 42

Interpretation

This design requires 42 subjects per group if you use the more conservative corrected / Fisher-style planning result.

Independent proportions — equivalence

Inputs used in the tutorial example:

  • Control Group Proportion = 0.30
  • Experimental Group Proportion = 0.50
  • Equivalence Margin = 0.25
  • Ratio = 1
  • One-sided Alpha = 0.025
  • Beta = 0.200
  • Hypothesis Type = Equivalence
BESHStatNG Independent Proportions dialog in equivalence mode with control proportion 0.3, experimental proportion 0.5, equivalence margin 0.25, and final corrected requirement 1504 subjects per group.
Equivalence planning for proportions can become very demanding.

From the results:

  • Lower-bound requirement, uncorrected: Controls = 14, Experimental = 14
  • Upper-bound requirement, uncorrected: Controls = 1464, Experimental = 1464
  • Lower-bound requirement, corrected / Fisher: Controls = 19, Experimental = 19
  • Upper-bound requirement, corrected / Fisher: Controls = 1504, Experimental = 1504
  • Driving bound = Upper bound
  • Final corrected / Fisher requirement: Controls = 1504, Experimental = 1504

Interpretation

Equivalence planning for proportions can become very demanding. In this example, the upper bound dominates the calculation, resulting in 1504 subjects per group under the corrected / Fisher-style recommendation.

Part III — Time-to-event designs

Log-rank test

Use log-rank planning when the main study question compares two survival curves.

In the tutorial example, the inputs are:

  • Hazard Ratio = 0.70
  • Control Event Proportion = 0.30
  • Experimental Event Proportion = 0.22
  • Ratio = 1
  • Alpha = 0.050
  • Beta = 0.200
BESHStatNG Log-rank sample size dialog with hazard ratio 0.7, control event proportion 0.3, experimental event proportion 0.22, and estimated 950 total subjects with 247 required events.
Log-rank sample size planning for two-group survival comparisons.

From the results:

  • Required events = 247
  • Estimated Number of Controls = 475
  • Estimated Number of Experimental subjects = 475
  • Estimated Total Number of Subjects = 950

Interpretation

For time-to-event studies, the required number of events is often the key design quantity. The total number of subjects depends on how frequently events are expected to occur. In this example, 247 events translate into 950 total subjects.

Cox regression — binary covariate

Use this procedure when the primary Cox model effect is a binary predictor, such as treatment group or presence/absence of a risk factor.

In the tutorial example, the inputs are:

  • Hazard Ratio = 0.70
  • Ratio of control to experimental subjects = 1
  • Overall Event Proportion = 0.26
  • R-squared with Other Covariates = 0
  • Alpha = 0.050
  • Beta = 0.200
  • Covariate Type = Binary Covariate
BESHStatNG Cox regression sample size dialog for a binary covariate with hazard ratio 0.7, overall event proportion 0.26, and estimated 950 subjects.
Cox regression planning with a binary predictor.

From the results:

  • Required events = 247
  • Estimated Number of Subjects = 950

Interpretation

The calculation shows that the model needs 247 events, corresponding here to 950 subjects. The overall event proportion links the event requirement to the projected total sample size.

Cox regression — continuous covariate

Use this version when the predictor is continuous, such as a biomarker, age, or a quantitative measurement.

In the tutorial example, the inputs are:

  • Hazard Ratio per Unit = 1.25
  • Covariate Standard Deviation = 2.5
  • Overall Event Proportion = 0.30
  • R-squared with Other Covariates = 0.10
  • Alpha = 0.050
  • Beta = 0.200
  • Covariate Type = Continuous Covariate
BESHStatNG Cox regression sample size dialog for a continuous covariate with hazard ratio per unit 1.25, covariate SD 2.5, and estimated 97 subjects.
Cox regression planning with a continuous predictor.

From the results:

  • Required events = 29
  • Estimated Number of Subjects = 97

Interpretation

Continuous-predictor planning uses the hazard ratio per unit increase together with the covariate standard deviation. In this example, the assumed effect size is strong enough that only 29 events and about 97 subjects are required.

Part IV — Reliability and agreement planning

Intraclass correlation (ICC)

Use ICC planning when the study goal is to estimate or test reliability across repeated observations, raters, or measurement methods.

In the tutorial example, the inputs are:

  • Null ICC = 0.50
  • Alternative ICC = 0.75
  • Observations per Subject = 3
  • One-sided Alpha = 0.050
  • Beta = 0.200
BESHStatNG Intraclass Correlation sample size dialog with null ICC 0.5, alternative ICC 0.75, three observations per subject, and estimated 23 subjects.
Reliability study planning with ICC.

From the results:

  • Estimated Number of Subjects = 23
  • Achieved power = 0.8049

Interpretation

With 3 observations per subject, the study requires 23 subjects to distinguish a reliability level of 0.75 from a null value of 0.50 at approximately 80% power.

Agreement planning for Bland–Altman analysis

Use this procedure when the goal is not to compare means or proportions, but to plan how precisely the limits of agreement will be estimated.

In the tutorial example, the inputs are:

  • SD of Differences = 5
  • Desired LoA CI Half-Width = 2
  • LoA Multiplier = 1.96
  • Alpha = 0.050
BESHStatNG Bland–Altman agreement sample size dialog with SD of differences 5, desired limits-of-agreement CI half-width 2, and estimated 74 pairs.
Agreement study planning based on Bland–Altman precision targets.

From the results:

  • Estimated Number of Pairs = 74
  • Achieved LoA CI half-width = 1.9887

Interpretation

This means the agreement study needs 74 paired measurements so that the confidence interval around the limits of agreement is narrow enough to achieve the desired precision target.

This is a different design goal from hypothesis testing. The focus is not on rejecting a null hypothesis, but on estimating agreement with acceptable precision.

How to choose the right sample size tool

A simple rule is:

  • use Paired T-test for matched or before/after continuous outcomes
  • use Unpaired T-test for two independent group means
  • use Single Proportion for one-sample binary designs
  • use Independent Proportions for two-group binary outcomes
  • use Log-rank for two-group survival comparisons
  • use Cox Regression when the primary analysis is a Cox model
  • use ICC for reliability studies
  • use Agreement (Bland–Altman) when the goal is precise limits-of-agreement estimation

Results match the workbook and UDF calculations

The companion workbook is designed to mirror the GUI examples shown in this tutorial.

The Scenario_Inputs sheet stores the planning assumptions.
The BESH_UDFs sheet evaluates the corresponding BESH.SSIZE.* functions directly in Excel.

That makes the workbook useful for two purposes:

  • reproducing the dialog examples shown on this page
  • building reusable planning templates for future studies

In other words, the GUI and the workbook are not separate workflows. They are two views of the same planning logic: one dialog-based and one formula-based.

Final remarks

Sample size planning is always only as good as the assumptions used. In real studies, the most important step is often not the button click, but choosing realistic values for:

  • effect size
  • standard deviation
  • event proportion
  • clinically acceptable margin
  • reliability target
  • desired agreement precision

BESHStatNG helps bring these design decisions into a clear, reproducible Excel workflow.

For full method details, assumptions, and formulas, see the online help and the sample size UDF documentation.