Preample
“Sample Size Calculation Using R” – the title of this blog post – sounds very comprehensive. However, this post only deals with the sample size collection for one particular test (two sample t-test). A collection of functions for basic sample size calculation can be found in the pwr package.
Introduction
A couple of months ago, I attended some statistics course about sample size calculation and sample size adjustment. Amongst others, we spoke about a randomized placebo-controlled trial with patients suffering of depression. The trial's primary end point was the difference between placebo and treatment group in the HAM-D total score (some depression score) between baseline and the end of therapy. During the course, the professor asked us to write a “two-liner” returning the sample size per group for a two-sided, two sample t-test with the following parameters:
- alpha = 0.05,
- sigma = 8 ,
- delta = 4 and
- power ≤ 0.90
Here is my solution:
In a first step, I wrote a loop returning the power rising from 0.1 to 0.9 and the corresponding sample sizes respectively. The loop may either be a repeat or a while loop.
While Loop
A while loop runs and repeats while a specified condition returns TRUE. The loop terminates only when the condition evaluates to FALSE. In our case, the loop stops evaluating as soon as p > 0.9.
p <- 0.05
while(p <= 0.9){
p <- p + 0.05
t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided")
print(paste(t$power, ceiling(t$n), sep = ' - '))
}
Repeat Loop
In a repeat loop it is required to define a so-called break declaration to stop repeating the code. Just like the while loop, our repeat loop stops evaluating as soon as p > 0.9.
Both, the while and the repeat loop return the same result.
p <- 0.05
repeat{
p <- p + 0.05
t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided")
print(paste(t$power, ceiling(t$n), sep = ' - '))
if(p > 0.90){
break
}
}
## [1] "0.1 - 5" ## [1] "0.15 - 8" ## [1] "0.2 - 12" ## [1] "0.25 - 15" ## [1] "0.3 - 18" ## [1] "0.35 - 21" ## [1] "0.4 - 25" ## [1] "0.45 - 28" ## [1] "0.5 - 32" ## [1] "0.55 - 36" ## [1] "0.6 - 41" ## [1] "0.65 - 45" ## [1] "0.7 - 51" ## [1] "0.75 - 57" ## [1] "0.8 - 64" ## [1] "0.85 - 73" ## [1] "0.9 - 86"
For a very good introduction to loops, see Davies (2016): The Book of R. No Starch Press: San Francisco.
Storing the Results
In case we want to use the results for whatever purpose, we need to store them in a vector. The following example heavily borrows from StackOverflow. The loop returns a data frame containing two variables: pwr (power) and num (sample size).
p <- 0.05
pwr <- c()
num <- c()
while(p <= 0.9){
p <- p + 0.05
t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided")
pwr <- c(pwr, t$power)
num <- c(num, ceiling(t$n))
df <- data.frame(pwr = pwr, n = num)
}
rm(list = c(setdiff(ls(), c("df"))))
dplyr::glimpse(df)
## Observations: 17 ## Variables: 2 ## $ pwr <dbl> 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55... ## $ n <dbl> 5, 8, 12, 15, 18, 21, 25, 28, 32, 36, 41, 45, 51, 57, 64, ...
Plotting Power and Sample Size
Finally, we plot the relation between power and sample size using the ggplot2 package. The theme I use, stems from the ggpubr package.
ggplot(df, aes(pwr, n)) +
geom_point() +
scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.1)) +
scale_y_continuous(breaks = seq(0, max(df$n)+10, 10)) +
geom_text(aes(label = n), nudge_y = 4) +
ggpubr::theme_pubr() +
labs(x = 'Power', y = 'Sample Size',
title = 'Power and Sample Size',
subtitle = expression('Two-sided, two sample t-test with' ~ sigma ~ '= 8,' ~ delta ~ '= 4,' ~ alpha ~ '= 0.05'))

As we see, a power of 0.90 requires a sample size of 86 patients per treatment group for a two-sided, two sample t-test with sd = 8, delta = 4, and alpha = 0.05.