tidyverse – Scripts & Statistics

Intro

I'm currently involved in a research project called EFFECT. EFFECT is a multicentre, cluster-randomised, placebo-controlled cross-over trial evaluating antiseptic body wash of patients on intensive care units (ICU). The trial is to test whether daily antiseptic body wash reduces the risk of intensive care unit (ICU)-acquired primary bacteraemia and ICU-acquired multidrug-resistant organisms. EFFECT requires two types of data: (1) The patients' individual ward-movement history and
(2) microbiological test results (see Meissner 2017).

According to the study protocol, positive blood tests do count as infection unless there is a negative blood test within 48 hours after the positive blood test.

In this blog post, I show how to solve this problem on a computational level.

The Problem

The following code chunk provides an hypothetical example of the microbiological data I have to deal with. The data frame df.mibi contains 4 variables:

ID: Patient id (only 1 patient in this example);
ORGANISM: name of skin commensal organism found in some blood sample,
RESULT: laboratory test result (POS vs. NEG);
DATE: date of laboratory test

library(tidyverse)
library(lubridate)

df.mibi <- tibble(
  ID = paste0("ID_", rep(1, 11)),
  ORGANISM = c(rep('Propionibacterium acnes', 2), 
               rep('Staphylococcus epidermidis', 2),
               rep('Staphylococcus capitis', 2),
               rep('', 5)),
  RESULT = c(rep('POS', 6), rep('NEG', 5)),
  DATE = ymd(c(
    "2018-02-07", "2018-02-12", "2018-02-13", "2018-02-20",
    "2018-02-21", "2018-03-18", "2018-02-01", "2018-02-06",
    "2018-02-10", "2018-02-21", "2018-04-05")
  )
)

My Idea

In a first step, I separated df.mibi into two data frames:

df.POS: containing positive blood tests only
df.NEG: containing negative blood tests only

df.POS <- df.mibi %>%
  filter(RESULT == 'POS')
df.NEG <- df.mibi %>%
  filter(RESULT == 'NEG')

In a second step, I removed two variables from df.NEG (RESULT, ORGANISM), grouped the data frame by ID, and put all dates belonging to one ID into the list column data using the nest() function of the tidyr package

df.NEG <- df.NEG %>%
  select(ID, DATE) %>%
    group_by(ID) %>%
      nest()

This is how both data frames look like:

df.POS

## # A tibble: 6 x 4
##   ID    ORGANISM                   RESULT DATE      
##   <chr> <chr>                      <chr>  <date>    
## 1 ID_1  Propionibacterium acnes    POS    2018-02-07
## 2 ID_1  Propionibacterium acnes    POS    2018-02-12
## 3 ID_1  Staphylococcus epidermidis POS    2018-02-13
## 4 ID_1  Staphylococcus epidermidis POS    2018-02-20
## 5 ID_1  Staphylococcus capitis     POS    2018-02-21
## 6 ID_1  Staphylococcus capitis     POS    2018-03-18

df.NEG

## # A tibble: 1 x 2
##   ID    data            
##   <chr> <list>          
## 1 ID_1  <tibble [5 x 1]>

In a third step, I tried to check whether one of the negative test (stored in the list variable data) lies within the time interval positive test + 48 hours (TIME).
I did the mapping using the map2() function of the purrr package:

# merging and mapping
df.TOTAL <- df.POS %>%
  left_join(df.NEG, by = 'ID') %>%
    mutate(TIME = interval(DATE, DATE + days(2)),
           RESULT = map2(data, "DATE", TIME, ~ .x %within% .y))

Unfortunaltely, my code did not work. The RESULT variable should be logical and return TRUE in case of a negative test result up to 2 days after the positive test. Instead it is a list and returns NULL.

df.TOTAL

## # A tibble: 6 x 6
##   ID    ORGANISM   RESULT DATE       data   TIME                          
##   <chr> <chr>      <list> <date>     <list> <S4: Interval>                
## 1 ID_1  Propionib~ <NULL> 2018-02-07 <tibb~ 2018-02-07 UTC--2018-02-09 UTC
## 2 ID_1  Propionib~ <NULL> 2018-02-12 <tibb~ 2018-02-12 UTC--2018-02-14 UTC
## 3 ID_1  Staphyloc~ <NULL> 2018-02-13 <tibb~ 2018-02-13 UTC--2018-02-15 UTC
## 4 ID_1  Staphyloc~ <NULL> 2018-02-20 <tibb~ 2018-02-20 UTC--2018-02-22 UTC
## 5 ID_1  Staphyloc~ <NULL> 2018-02-21 <tibb~ 2018-02-21 UTC--2018-02-23 UTC
## 6 ID_1  Staphyloc~ <NULL> 2018-03-18 <tibb~ 2018-03-18 UTC--2018-03-20 UTC

The Solution

Not even one hour after I posted my question to StackOverflow, a user who calles himself “utubun” found the following solution:

df.TOTAL <- df.POS %>%
  left_join(df.NEG, by = 'ID') %>%
    mutate(TIME = interval(DATE, DATE + days(2)),
           RESULT = map2_lgl(data, TIME, ~ any(.x$DATE %within% .y)))
df.TOTAL

## # A tibble: 6 x 6
##   ID    ORGANISM   RESULT DATE       data   TIME                          
##   <chr> <chr>      <lgl>  <date>     <list> <S4: Interval>                
## 1 ID_1  Propionib~ FALSE  2018-02-07 <tibb~ 2018-02-07 UTC--2018-02-09 UTC
## 2 ID_1  Propionib~ FALSE  2018-02-12 <tibb~ 2018-02-12 UTC--2018-02-14 UTC
## 3 ID_1  Staphyloc~ FALSE  2018-02-13 <tibb~ 2018-02-13 UTC--2018-02-15 UTC
## 4 ID_1  Staphyloc~ TRUE   2018-02-20 <tibb~ 2018-02-20 UTC--2018-02-22 UTC
## 5 ID_1  Staphyloc~ TRUE   2018-02-21 <tibb~ 2018-02-21 UTC--2018-02-23 UTC
## 6 ID_1  Staphyloc~ FALSE  2018-03-18 <tibb~ 2018-03-18 UTC--2018-03-20 UTC

It works!!! Thank you very much! 🙂

Tag: tidyverse

How to Check if a Date is Within a List of Intervals in R

Intro

The Problem

My Idea

The Solution

Intro

The Problem

My Idea

The Solution

Share this: