lists – Scripts & Statistics

RMarkdown and Quarto documents are “dynamic analysis documents that combine code, rendered output (such as figures), and prose”. Since everything in R is an object, lots of R objects (numbers, charts, tables, statistical models, etc.) are created while rendering the document. In large documents, it can be difficult to keep track with where and when these objects are created which sometimes makes debugging a cumbersome and time consuming job.

However, in this blog post, I will show how to use lists to improve clarity throughout the document. Let me give you an example:

Creating a `list` with named objects

In R, a “list” itself is an object and can be created using the native list() function. In the following code snippet, we save list as an object (“lst.a”) and add one named object which is numeric (‘nmb.1’). The str()function shows all objects our list contains.

lst.a <- list(
  nmb.1 = 10
)

str(lst.a)

## List of 1
##  $ nmb.1: num 10

Reusing objects

Next, we try to create a new list and check whether it’s possible to use one object (‘nmb.1’) to compute another object (‘nmb.2’):

lst.b <- list(
  nmb.1 = 10,
  nmb.2 = nmb.1 + 10
)

## Error:
## ! object 'nmb.1' not found

As we can see, the list() function cannot create objects and reuse them at the same time. Fortunately, the {tibble} package contains the lst() function which is able to do so:

lst.b <- tibble::lst(
  nmb.1 = 10,
  nmb.2 = nmb.1 + 10
)
str(lst.b)

## List of 2
##  $ nmb.1: num 10
##  $ nmb.2: num 20

Saving plots in lists

Unfortunately, base R plots cannot be saved as object and, thus, cannot be put into a list. When we evaluate the following line of Rcode, we see that the plot() function is evaluated immediately (as a side effect). The object it was assigned to remains empty (“NULL”).

lst.c <- list(
  plt = plot(1:10)
)

plot of chunk lst-plot

lst.c$plt

## NULL

However, the {ggplot2} package produces plots which can be assigned to objects and, thus, can be put into lists.

lst.d <- list(
  ggplt = data.frame(x = 1:10, y = 1:10) %>% ggplot(aes(x, y)) + geom_point()
)
lst.d$ggplt

plot of chunk lst-ggplot

Statistical models, data.frames etc.

It probably doesn’t come as a surprise that many more objects can be put into a list. In the following example, we us the ‘mtcars’ data.frame to build a simple linear model explaining horsepower (hp) by miles per gallon (mpg). At the same time, we put this model into a list (‘lst.e’) both as a list (‘mdl.lm’) and a tibble.

put a linear model explaining horsepower (hp) by miles per gallon (mpg) both as a list and a tibble (‘tib.lm’).

lst.e <- lst(
  mdl.lm = lm(hp ~ mpg, data = mtcars),
  tib.lm = broom::tidy(mdl.lm)
)

str(lst.e,  list.len = 5)

## List of 2
##  $ mdl.lm:List of 12
##   ..$ coefficients : Named num [1:2] 324.08 -8.83
##   .. ..- attr(*, "names")= chr [1:2] "(Intercept)" "mpg"
##   ..$ residuals    : Named num [1:32] -28.7 -28.7 -29.8 -25.1 16 ...
##   .. ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
##   ..$ effects      : Named num [1:32] -829.8 296.3 -23.6 -20 19.3 ...
##   .. ..- attr(*, "names")= chr [1:32] "(Intercept)" "mpg" "" "" ...
##   ..$ rank         : int 2
##   ..$ fitted.values: Named num [1:32] 139 139 123 135 159 ...
##   .. ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
##   .. [list output truncated]
##   ..- attr(*, "class")= chr "lm"
##  $ tib.lm: tibble [2 x 5] (S3: tbl_df/tbl/data.frame)
##   ..$ term     : chr [1:2] "(Intercept)" "mpg"
##   ..$ estimate : num [1:2] 324.08 -8.83
##   ..$ std.error: num [1:2] 27.43 1.31
##   ..$ statistic: num [1:2] 11.81 -6.74
##   ..$ p.value  : num [1:2] 8.25e-13 1.79e-07

Using lists with RMarkdown and Quarto

Finally, we are going to save everything into a single list and put the list elements into a text.

lst.final <- lst(
  nmb.1 = nrow(mtcars),
  nmb.2 = ncol(mtcars),
  ggplt = mtcars %>% ggplot(aes(x = hp, y = mpg)) +
    geom_point() + geom_smooth(method='lm', se = FALSE) + ggthemes::theme_clean(),
  mdl.lm = lm(hp ~ mpg, data = mtcars),
  tib.lm = broom::tidy(mdl.lm, conf.int = TRUE)
)

Putting text and `R` code together

This is just an example text showing how to put text and code into:

The ‘mtcars’ dataset consists of lst.final$nmb.1 rows and lst.final$nmb.2 columns.

Rendering to:

The ‘mtcars’ dataset consists of 32 rows and 11 columns.

The relation between horsepower and miles per gallon is visualised in the following figure:

lst.final$ggplt

Horsepower and miles per gallon

Statistical models can be easily put into a table using the {gtsummary} package:

library(gtsummary)

tbl_regression(lst.final$mdl.lm, intercept = TRUE) %>% 
  as_hux_table()

Characteristic	Beta	95% CI	p-value
(Intercept)	324	268, 380	<0.001
mpg	-8.8	-12, -6.2	<0.001
Abbreviation: CI = Confidence Interval

Intro

I'm currently involved in a research project called EFFECT. EFFECT is a multicentre, cluster-randomised, placebo-controlled cross-over trial evaluating antiseptic body wash of patients on intensive care units (ICU). The trial is to test whether daily antiseptic body wash reduces the risk of intensive care unit (ICU)-acquired primary bacteraemia and ICU-acquired multidrug-resistant organisms. EFFECT requires two types of data: (1) The patients' individual ward-movement history and
(2) microbiological test results (see Meissner 2017).

According to the study protocol, positive blood tests do count as infection unless there is a negative blood test within 48 hours after the positive blood test.

In this blog post, I show how to solve this problem on a computational level.

The Problem

The following code chunk provides an hypothetical example of the microbiological data I have to deal with. The data frame df.mibi contains 4 variables:

ID: Patient id (only 1 patient in this example);
ORGANISM: name of skin commensal organism found in some blood sample,
RESULT: laboratory test result (POS vs. NEG);
DATE: date of laboratory test

library(tidyverse)
library(lubridate)

df.mibi <- tibble(
  ID = paste0("ID_", rep(1, 11)),
  ORGANISM = c(rep('Propionibacterium acnes', 2), 
               rep('Staphylococcus epidermidis', 2),
               rep('Staphylococcus capitis', 2),
               rep('', 5)),
  RESULT = c(rep('POS', 6), rep('NEG', 5)),
  DATE = ymd(c(
    "2018-02-07", "2018-02-12", "2018-02-13", "2018-02-20",
    "2018-02-21", "2018-03-18", "2018-02-01", "2018-02-06",
    "2018-02-10", "2018-02-21", "2018-04-05")
  )
)

My Idea

In a first step, I separated df.mibi into two data frames:

df.POS: containing positive blood tests only
df.NEG: containing negative blood tests only

df.POS <- df.mibi %>%
  filter(RESULT == 'POS')
df.NEG <- df.mibi %>%
  filter(RESULT == 'NEG')

In a second step, I removed two variables from df.NEG (RESULT, ORGANISM), grouped the data frame by ID, and put all dates belonging to one ID into the list column data using the nest() function of the tidyr package

df.NEG <- df.NEG %>%
  select(ID, DATE) %>%
    group_by(ID) %>%
      nest()

This is how both data frames look like:

df.POS

## # A tibble: 6 x 4
##   ID    ORGANISM                   RESULT DATE      
##   <chr> <chr>                      <chr>  <date>    
## 1 ID_1  Propionibacterium acnes    POS    2018-02-07
## 2 ID_1  Propionibacterium acnes    POS    2018-02-12
## 3 ID_1  Staphylococcus epidermidis POS    2018-02-13
## 4 ID_1  Staphylococcus epidermidis POS    2018-02-20
## 5 ID_1  Staphylococcus capitis     POS    2018-02-21
## 6 ID_1  Staphylococcus capitis     POS    2018-03-18

df.NEG

## # A tibble: 1 x 2
##   ID    data            
##   <chr> <list>          
## 1 ID_1  <tibble [5 x 1]>

In a third step, I tried to check whether one of the negative test (stored in the list variable data) lies within the time interval positive test + 48 hours (TIME).
I did the mapping using the map2() function of the purrr package:

# merging and mapping
df.TOTAL <- df.POS %>%
  left_join(df.NEG, by = 'ID') %>%
    mutate(TIME = interval(DATE, DATE + days(2)),
           RESULT = map2(data, "DATE", TIME, ~ .x %within% .y))

Unfortunaltely, my code did not work. The RESULT variable should be logical and return TRUE in case of a negative test result up to 2 days after the positive test. Instead it is a list and returns NULL.

df.TOTAL

## # A tibble: 6 x 6
##   ID    ORGANISM   RESULT DATE       data   TIME                          
##   <chr> <chr>      <list> <date>     <list> <S4: Interval>                
## 1 ID_1  Propionib~ <NULL> 2018-02-07 <tibb~ 2018-02-07 UTC--2018-02-09 UTC
## 2 ID_1  Propionib~ <NULL> 2018-02-12 <tibb~ 2018-02-12 UTC--2018-02-14 UTC
## 3 ID_1  Staphyloc~ <NULL> 2018-02-13 <tibb~ 2018-02-13 UTC--2018-02-15 UTC
## 4 ID_1  Staphyloc~ <NULL> 2018-02-20 <tibb~ 2018-02-20 UTC--2018-02-22 UTC
## 5 ID_1  Staphyloc~ <NULL> 2018-02-21 <tibb~ 2018-02-21 UTC--2018-02-23 UTC
## 6 ID_1  Staphyloc~ <NULL> 2018-03-18 <tibb~ 2018-03-18 UTC--2018-03-20 UTC

The Solution

Not even one hour after I posted my question to StackOverflow, a user who calles himself “utubun” found the following solution:

df.TOTAL <- df.POS %>%
  left_join(df.NEG, by = 'ID') %>%
    mutate(TIME = interval(DATE, DATE + days(2)),
           RESULT = map2_lgl(data, TIME, ~ any(.x$DATE %within% .y)))
df.TOTAL

## # A tibble: 6 x 6
##   ID    ORGANISM   RESULT DATE       data   TIME                          
##   <chr> <chr>      <lgl>  <date>     <list> <S4: Interval>                
## 1 ID_1  Propionib~ FALSE  2018-02-07 <tibb~ 2018-02-07 UTC--2018-02-09 UTC
## 2 ID_1  Propionib~ FALSE  2018-02-12 <tibb~ 2018-02-12 UTC--2018-02-14 UTC
## 3 ID_1  Staphyloc~ FALSE  2018-02-13 <tibb~ 2018-02-13 UTC--2018-02-15 UTC
## 4 ID_1  Staphyloc~ TRUE   2018-02-20 <tibb~ 2018-02-20 UTC--2018-02-22 UTC
## 5 ID_1  Staphyloc~ TRUE   2018-02-21 <tibb~ 2018-02-21 UTC--2018-02-23 UTC
## 6 ID_1  Staphyloc~ FALSE  2018-03-18 <tibb~ 2018-03-18 UTC--2018-03-20 UTC

It works!!! Thank you very much! 🙂

Tag: lists

How to use lists in RMarkdown and Quarto

Creating a `list` with named objects

Reusing objects

Saving plots in lists

Statistical models, data.frames etc.

Putting text and `R` code together

How to Check if a Date is Within a List of Intervals in R

Intro

The Problem

My Idea

The Solution

Creating a list with named objects

Reusing objects

Saving plots in lists

Statistical models, data.frames etc.

Putting text and R code together

Share this:

Intro

The Problem

My Idea

The Solution

Share this:

Creating a `list` with named objects

Putting text and `R` code together