loops – Scripts & Statistics

Preample

“Sample Size Calculation Using R” – the title of this blog post – sounds very comprehensive. However, this post only deals with the sample size collection for one particular test (two sample t-test). A collection of functions for basic sample size calculation can be found in the pwr package.

Introduction

A couple of months ago, I attended some statistics course about sample size calculation and sample size adjustment. Amongst others, we spoke about a randomized placebo-controlled trial with patients suffering of depression. The trial's primary end point was the difference between placebo and treatment group in the HAM-D total score (some depression score) between baseline and the end of therapy. During the course, the professor asked us to write a “two-liner” returning the sample size per group for a two-sided, two sample t-test with the following parameters:

alpha = 0.05,
sigma = 8 ,
delta = 4 and
power ≤ 0.90

Here is my solution:

In a first step, I wrote a loop returning the power rising from 0.1 to 0.9 and the corresponding sample sizes respectively. The loop may either be a repeat or a while loop.

While Loop

A while loop runs and repeats while a specified condition returns TRUE. The loop terminates only when the condition evaluates to FALSE. In our case, the loop stops evaluating as soon as p > 0.9.

p <- 0.05
while(p <= 0.9){
  p <- p + 0.05
  t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided")
  print(paste(t$power, ceiling(t$n), sep = ' - '))
}

Repeat Loop

In a repeat loop it is required to define a so-called break declaration to stop repeating the code. Just like the while loop, our repeat loop stops evaluating as soon as p > 0.9.

Both, the while and the repeat loop return the same result.

p <- 0.05
repeat{
  p <- p + 0.05
  t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided")
  print(paste(t$power, ceiling(t$n), sep = ' - '))
  if(p > 0.90){
    break
  }
}

## [1] "0.1 - 5"
## [1] "0.15 - 8"
## [1] "0.2 - 12"
## [1] "0.25 - 15"
## [1] "0.3 - 18"
## [1] "0.35 - 21"
## [1] "0.4 - 25"
## [1] "0.45 - 28"
## [1] "0.5 - 32"
## [1] "0.55 - 36"
## [1] "0.6 - 41"
## [1] "0.65 - 45"
## [1] "0.7 - 51"
## [1] "0.75 - 57"
## [1] "0.8 - 64"
## [1] "0.85 - 73"
## [1] "0.9 - 86"

For a very good introduction to loops, see Davies (2016): The Book of R. No Starch Press: San Francisco.

Storing the Results

In case we want to use the results for whatever purpose, we need to store them in a vector. The following example heavily borrows from StackOverflow. The loop returns a data frame containing two variables: pwr (power) and num (sample size).

p <- 0.05
pwr <- c()
num <- c()
while(p <= 0.9){
  p <- p + 0.05
  t <- power.t.test(sd = 8, delta = 4, sig.level=0.05, power = p, type="two.sample", alternative = "two.sided")
  pwr <- c(pwr, t$power)
  num <- c(num, ceiling(t$n))
  df <- data.frame(pwr = pwr, n = num)
}
rm(list = c(setdiff(ls(), c("df"))))
dplyr::glimpse(df)

## Observations: 17
## Variables: 2
## $ pwr <dbl> 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55...
## $ n   <dbl> 5, 8, 12, 15, 18, 21, 25, 28, 32, 36, 41, 45, 51, 57, 64, ...

Plotting Power and Sample Size

Finally, we plot the relation between power and sample size using the ggplot2 package. The theme I use, stems from the ggpubr package.

ggplot(df, aes(pwr, n)) +
  geom_point() +
  scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.1)) +
  scale_y_continuous(breaks = seq(0, max(df$n)+10, 10)) +
  geom_text(aes(label = n), nudge_y = 4) +
  ggpubr::theme_pubr() +
  labs(x = 'Power', y = 'Sample Size', 
       title = 'Power and Sample Size', 
        subtitle = expression('Two-sided, two sample t-test with' ~ sigma ~ '= 8,' ~ delta ~ '= 4,' ~ alpha ~ '= 0.05'))

plot of chunk unnamed-chunk-4

As we see, a power of 0.90 requires a sample size of 86 patients per treatment group for a two-sided, two sample t-test with sd = 8, delta = 4, and alpha = 0.05.