Sample size planning is one of the most important steps in study design. Before collecting data, you need to know how many subjects, pairs, or events are required to detect a clinically meaningful effect with acceptable power.
BESHStatNG makes it possible to perform these planning workflows directly in Excel. The sample size section covers not only classical continuous and proportion-based designs, but also survival designs, reliability studies, and agreement studies.
This tutorial shows how to use the Sample Size section of the BESH Stat NG ribbon together with the companion workbook to plan studies for:
- paired t-tests
- unpaired t-tests
- single proportions
- independent proportions
- log-rank tests
- Cox regression
- intraclass correlation (ICC)
- Bland–Altman agreement studies
The goal is not only to show where to click, but also to explain which planning tool matches which study question, how to interpret the returned numbers, and how to reuse the workbook as a formula-based planning template.
In this tutorial, you will learn how to
- choose the correct sample size tool for a given study design
- enter planning assumptions in the BESHStatNG dialogs
- interpret required numbers of subjects, pairs, and events
- understand the difference between superiority, non-inferiority, and equivalence planning
- use the workbook to keep planning assumptions and UDF-based calculations together in one Excel file
Download the tutorial workbook
This tutorial uses one workbook:
[Download the sample size planning workbook]
The workbook contains:
- Overview — what the workbook includes
- Scenario_Inputs — editable planning assumptions
- BESH_UDFs — formula-driven sample size calculations using the BESH sample size functions
- Function_Guide — quick function signatures and examples
- Reference_Notes — page-building notes and reminders
- GUI_Screenshots — optional space for screenshot assembly
The workbook is designed so you can follow the tutorial step by step, reproduce the dialog calculations in Excel, and later adapt the same structure to your own studies.
Start from the BESH Stat NG ribbon
All methods used in this tutorial are available from the Sample Size section of the BESH Stat NG ribbon.

Start by opening:
BESH Stat NG → Analyse → Sample Size
From there, select the planning tool that matches your study design.
Part I — Continuous outcomes
Paired t-test
Use the paired t-test sample size tool when each subject contributes a matched pair of measurements, such as:
- before/after measurements
- left/right side comparisons
- matched laboratory measurements
- repeated measurements on the same subject
In the tutorial example, the dialog uses:
- Mean Difference = 2
- Standard Deviation = 5
- Alpha = 0.050
- Beta = 0.200

From the results:
- Estimated Number of Pairs = 52
How to interpret it
This means the study needs 52 paired observations to detect a mean paired difference of 2 units with 80% power at a 5% significance level.
The key point is that this calculation is based on the standard deviation of the paired differences, not the standard deviation of the raw measurements. In paired designs, the relevant source of variability is the within-pair difference.
Unpaired t-test
Use the unpaired t-test tool when you want to compare the means of two independent groups.
BESHStatNG supports three planning modes:
- superiority
- non-inferiority
- equivalence
Unpaired t-test — superiority
This design is appropriate when the goal is to show that the group means differ.
In the tutorial example, the inputs are:
- Mean Difference = 2
- Standard Deviation = 5
- Ratio of control to experimental subjects = 1
- Alpha = 0.050
- Beta = 0.200
- Hypothesis Type = Superiority

From the results:
- Estimated Number of Control subjects = 100
- Estimated Number of Experimental subjects = 100
Interpretation
A balanced two-group superiority study with these assumptions requires 100 subjects per group.
Unpaired t-test — non-inferiority
Use non-inferiority planning when the aim is to show that the experimental group is not worse than the control group by more than a predefined margin.
In the tutorial example, the inputs are:
- Expected Mean Difference = 1
- Non-inferiority Margin = 2
- Standard Deviation = 4
- Ratio = 1
- One-sided Alpha = 0.025
- Beta = 0.200
- Hypothesis Type = Noninferiority

From the results:
- Estimated Number of Control subjects = 29
- Estimated Number of Experimental subjects = 29
Interpretation
Under these assumptions, a non-inferiority design requires 29 subjects per group. The margin defines the largest clinically acceptable loss that would still support non-inferiority.
Unpaired t-test — equivalence
Use equivalence planning when the goal is to show that the true mean difference lies within a predefined acceptable interval.
In the tutorial example, the inputs are:
- Expected Mean Difference = 1
- Equivalence Margin = 2
- Standard Deviation = 4
- Ratio = 1
- One-sided Alpha = 0.025
- Beta = 0.200
- Hypothesis Type = Equivalence

From the results:
- Lower-bound requirement: Controls = 29, Experimental = 29
- Upper-bound requirement: Controls = 253, Experimental = 253
- Driving bound = Upper bound
- Estimated Number of Controls = 253
- Estimated Number of Experimental subjects = 253
Interpretation
Equivalence planning is usually more demanding than superiority or non-inferiority. In this example, the upper equivalence bound drives the sample size, so the study needs 253 subjects per group.
Part II — Proportions
Single proportion
Use this procedure when the outcome is binary and the study question compares an anticipated proportion against a null hypothesis proportion.
In the tutorial example, the inputs are:
- Proportion = 0.60
- Null Hypothesis Proportion = 0.50
- Alpha = 0.050
- Beta = 0.200

From the results:
- Estimated Number of Subjects = 189
Interpretation
A one-sample proportion study with these assumptions requires 189 subjects. This is a common design for prevalence studies, quality benchmarks, and single-arm studies with binary outcomes.
Independent proportions
Use this procedure when you are comparing proportions between two independent groups, such as response rates, event rates, or complication rates.
BESHStatNG supports:
- superiority
- non-inferiority
- equivalence
The output is especially useful because it reports different recommendations for the uncorrected chi-square approach and for the corrected chi-square / Fisher-style approach.
Independent proportions — superiority
Inputs used in the tutorial example:
- Control Group Proportion = 0.30
- Experimental Group Proportion = 0.50
- Ratio = 1
- Alpha = 0.050
- Beta = 0.200
- Hypothesis Type = Superiority

From the results:
- Uncorrected chi-square: Controls = 93, Experimental = 93
- Corrected chi-square / Fisher exact: Controls = 103, Experimental = 103
Interpretation
For practical planning, the corrected or Fisher-style recommendation is usually the safer choice. In this example, that means planning for 103 subjects per group.
Independent proportions — non-inferiority
Inputs used in the tutorial example:
- Control Group Proportion = 0.30
- Experimental Group Proportion = 0.50
- Non-inferiority Margin = 0.10
- Ratio = 1
- One-sided Alpha = 0.025
- Beta = 0.200
- Hypothesis Type = Noninferiority

From the results:
- Uncorrected chi-square: Controls = 35, Experimental = 35
- Corrected chi-square / Fisher exact: Controls = 42, Experimental = 42
Interpretation
This design requires 42 subjects per group if you use the more conservative corrected / Fisher-style planning result.
Independent proportions — equivalence
Inputs used in the tutorial example:
- Control Group Proportion = 0.30
- Experimental Group Proportion = 0.50
- Equivalence Margin = 0.25
- Ratio = 1
- One-sided Alpha = 0.025
- Beta = 0.200
- Hypothesis Type = Equivalence

From the results:
- Lower-bound requirement, uncorrected: Controls = 14, Experimental = 14
- Upper-bound requirement, uncorrected: Controls = 1464, Experimental = 1464
- Lower-bound requirement, corrected / Fisher: Controls = 19, Experimental = 19
- Upper-bound requirement, corrected / Fisher: Controls = 1504, Experimental = 1504
- Driving bound = Upper bound
- Final corrected / Fisher requirement: Controls = 1504, Experimental = 1504
Interpretation
Equivalence planning for proportions can become very demanding. In this example, the upper bound dominates the calculation, resulting in 1504 subjects per group under the corrected / Fisher-style recommendation.
Part III — Time-to-event designs
Log-rank test
Use log-rank planning when the main study question compares two survival curves.
In the tutorial example, the inputs are:
- Hazard Ratio = 0.70
- Control Event Proportion = 0.30
- Experimental Event Proportion = 0.22
- Ratio = 1
- Alpha = 0.050
- Beta = 0.200

From the results:
- Required events = 247
- Estimated Number of Controls = 475
- Estimated Number of Experimental subjects = 475
- Estimated Total Number of Subjects = 950
Interpretation
For time-to-event studies, the required number of events is often the key design quantity. The total number of subjects depends on how frequently events are expected to occur. In this example, 247 events translate into 950 total subjects.
Cox regression — binary covariate
Use this procedure when the primary Cox model effect is a binary predictor, such as treatment group or presence/absence of a risk factor.
In the tutorial example, the inputs are:
- Hazard Ratio = 0.70
- Ratio of control to experimental subjects = 1
- Overall Event Proportion = 0.26
- R-squared with Other Covariates = 0
- Alpha = 0.050
- Beta = 0.200
- Covariate Type = Binary Covariate

From the results:
- Required events = 247
- Estimated Number of Subjects = 950
Interpretation
The calculation shows that the model needs 247 events, corresponding here to 950 subjects. The overall event proportion links the event requirement to the projected total sample size.
Cox regression — continuous covariate
Use this version when the predictor is continuous, such as a biomarker, age, or a quantitative measurement.
In the tutorial example, the inputs are:
- Hazard Ratio per Unit = 1.25
- Covariate Standard Deviation = 2.5
- Overall Event Proportion = 0.30
- R-squared with Other Covariates = 0.10
- Alpha = 0.050
- Beta = 0.200
- Covariate Type = Continuous Covariate

From the results:
- Required events = 29
- Estimated Number of Subjects = 97
Interpretation
Continuous-predictor planning uses the hazard ratio per unit increase together with the covariate standard deviation. In this example, the assumed effect size is strong enough that only 29 events and about 97 subjects are required.
Part IV — Reliability and agreement planning
Intraclass correlation (ICC)
Use ICC planning when the study goal is to estimate or test reliability across repeated observations, raters, or measurement methods.
In the tutorial example, the inputs are:
- Null ICC = 0.50
- Alternative ICC = 0.75
- Observations per Subject = 3
- One-sided Alpha = 0.050
- Beta = 0.200

From the results:
- Estimated Number of Subjects = 23
- Achieved power = 0.8049
Interpretation
With 3 observations per subject, the study requires 23 subjects to distinguish a reliability level of 0.75 from a null value of 0.50 at approximately 80% power.
Agreement planning for Bland–Altman analysis
Use this procedure when the goal is not to compare means or proportions, but to plan how precisely the limits of agreement will be estimated.
In the tutorial example, the inputs are:
- SD of Differences = 5
- Desired LoA CI Half-Width = 2
- LoA Multiplier = 1.96
- Alpha = 0.050

From the results:
- Estimated Number of Pairs = 74
- Achieved LoA CI half-width = 1.9887
Interpretation
This means the agreement study needs 74 paired measurements so that the confidence interval around the limits of agreement is narrow enough to achieve the desired precision target.
This is a different design goal from hypothesis testing. The focus is not on rejecting a null hypothesis, but on estimating agreement with acceptable precision.
How to choose the right sample size tool
A simple rule is:
- use Paired T-test for matched or before/after continuous outcomes
- use Unpaired T-test for two independent group means
- use Single Proportion for one-sample binary designs
- use Independent Proportions for two-group binary outcomes
- use Log-rank for two-group survival comparisons
- use Cox Regression when the primary analysis is a Cox model
- use ICC for reliability studies
- use Agreement (Bland–Altman) when the goal is precise limits-of-agreement estimation
Results match the workbook and UDF calculations
The companion workbook is designed to mirror the GUI examples shown in this tutorial.
The Scenario_Inputs sheet stores the planning assumptions.
The BESH_UDFs sheet evaluates the corresponding BESH.SSIZE.* functions directly in Excel.
That makes the workbook useful for two purposes:
- reproducing the dialog examples shown on this page
- building reusable planning templates for future studies
In other words, the GUI and the workbook are not separate workflows. They are two views of the same planning logic: one dialog-based and one formula-based.
Final remarks
Sample size planning is always only as good as the assumptions used. In real studies, the most important step is often not the button click, but choosing realistic values for:
- effect size
- standard deviation
- event proportion
- clinically acceptable margin
- reliability target
- desired agreement precision
BESHStatNG helps bring these design decisions into a clear, reproducible Excel workflow.
For full method details, assumptions, and formulas, see the online help and the sample size UDF documentation.