How to create a descriptive summary table (‘table 1’) using R

Intro

“Table 1”, that is a table providing the sample characteristics of an empirical study or clinical trial is an obligatory part of scientific publications. Since I started using {R} some ten years ago, I have come across a couple of packages and functions aiming to create such a table. In this blog post, I’ll give an overview about two of these packages and show how to use them.

Data

The data.frame I use is named ‘trial’ and is part of the {gtsummary} package. It contains the following variables:

head(gtsummary::trial)
## # A tibble: 6 x 8
##   trt      age marker stage grade response death ttdeath
##   <chr>  <dbl>  <dbl> <fct> <fct>    <int> <int>   <dbl>
## 1 Drug A    23  0.16  T1    II           0     0    24  
## 2 Drug B     9  1.11  T2    I            1     0    24  
## 3 Drug A    31  0.277 T1    II           0     0    24  
## 4 Drug A    NA  2.07  T3    III          1     1    17.6
## 5 Drug A    51  2.77  T4    III          1     1    16.4
## 6 Drug B    39  0.613 T4    I            0     1    15.6

Apparently, the data.frame contains a treatment variable (“trt”) with two categories (“Drug A” vs. “Drug B”) and several categorical and numerical variables.

{finalfit}

According to the package description, the {finalfit} package has the following purposes:

Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and ‘Word’ using ‘RMarkdown’.

However, with summary_factorlist(), the package also includes a function to create a table with summary statistics.

library(finalfit)
## Error in `library()`:
## ! there is no package called 'finalfit'
library(dplyr)
tab.ff <- gtsummary::trial %>%
  mutate(across(c(response, death),
    factor,
    levels = c(1, 0), labels = c("yes", "no")
  )) %>%
  summary_factorlist(
    dependent = "trt", # name of grouping / treatment variable
    explanatory = c("age", "", "marker", "stage", "response", "death", "ttdeath"),
    total_col = TRUE, # add column with statistics for the whole sample
    add_row_total = TRUE, # add column with number of valid cases
    include_row_missing_col = FALSE,
    na_include = TRUE # make variables' missing data explicit
  )
## Error in `summary_factorlist()`:
## ! could not find function "summary_factorlist"
tab.ff
## Error:
## ! object 'tab.ff' not found

Rather than printing a table, the summary_factorlist() function returns a data.frame which must be further processed and piped to a printing function. The following example shows how these steps may be done using the {labelled} and the {kableExtra} package.

library(labelled)
library(dplyr)
library(kableExtra, exclude = "group_rows")
gtsummary::trial %>%
  mutate(across(c(response, death),
    factor,
    levels = c(1, 0), labels = c("yes", "no")
  )) %>%
  # Add variable labels
  set_variable_labels(
    age = "Age [yrs]",
    marker = "Marker Level [ng/mL]",
    stage = "T Stage",
    grade = "Grade",
    response = "Tumor Response",
    death = "Patient Died",
    ttdeath = "Months to Death/Censor"
  ) %>%
  summary_factorlist(
    dependent = "trt", # name of grouping / treatment variable
    explanatory = c("age", "", "marker", "stage", "response", "death", "ttdeath"),
    total_col = TRUE, # add column with statistics for the whole sample
    add_row_total = TRUE, # add column with number of valid cases
    include_row_missing_col = FALSE,
    na_include = TRUE # make variables' missing data explicit
  ) %>%
  kbl(
    caption = "Baseline characteristics",
    booktabs = TRUE,
    col.names = c(
      " ", "Total N", " ",
      "Drug A", "Drug B", "Total"
    ),
    align = "lrlrrr",
  ) %>%
  kable_classic(full_width = FALSE)
## Error in `summary_factorlist()`:
## ! could not find function "summary_factorlist"

{gtsummary}

The description of the {gtsummary} package gives the following information:

The package creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically dentified and the tables are pre-filled with appropriate column headers.

For creating a table with summary statistics, the tbl_summary() function is required. In addition, the {dplyr} package should be loaded.

library(gtsummary)
library(dplyr)

The creation of the table works best using the pipe operator:

tbl.gts <- trial %>%
  # categorical variables must be factors or character strings
  mutate(across(c(response, death),
    factor,
    levels = c(1, 0), labels = c("yes", "no")
  )) %>%
  # apply the tbl_summary() function
  tbl_summary(
    by = trt, # Treatment variable
    label = list(
      age ~ "Age [yrs]",
      marker ~ "Marker Level [ng/mL]",
      stage ~ "T Stage",
      grade ~ "Grade",
      response ~ "Tumor Response",
      death ~ "Patient Died",
      ttdeath ~ "Months to Death/Censor"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    # may be also ...
    # statistic = list(all_continuous() ~ "{median} ({p25}, {p75})"),
    digits = all_continuous() ~ 1,
    missing_text = "(Missing)",
    include = everything() # select variables to be included into the table
  ) %>%
  add_overall() %>% # add column with statistics for the whole sample
  add_n() # add column with number of valid cases
tbl.gts

#tllrekyvog table {
font-family: system-ui, ‘Segoe UI’, Roboto, Helvetica, Arial, sans-serif, ‘Apple Color Emoji’, ‘Segoe UI Emoji’, ‘Segoe UI Symbol’, ‘Noto Color Emoji’;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
}

#tllrekyvog thead, #tllrekyvog tbody, #tllrekyvog tfoot, #tllrekyvog tr, #tllrekyvog td, #tllrekyvog th {
border-style: none;
}

#tllrekyvog p {
margin: 0;
padding: 0;
}

#tllrekyvog .gt_table {
display: table;
border-collapse: collapse;
line-height: normal;
margin-left: auto;
margin-right: auto;
color: #333333;
font-size: 16px;
font-weight: normal;
font-style: normal;
background-color: #FFFFFF;
width: auto;
border-top-style: solid;
border-top-width: 2px;
border-top-color: #A8A8A8;
border-right-style: none;
border-right-width: 2px;
border-right-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #A8A8A8;
border-left-style: none;
border-left-width: 2px;
border-left-color: #D3D3D3;
}

#tllrekyvog .gt_caption {
padding-top: 4px;
padding-bottom: 4px;
}

#tllrekyvog .gt_title {
color: #333333;
font-size: 125%;
font-weight: initial;
padding-top: 4px;
padding-bottom: 4px;
padding-left: 5px;
padding-right: 5px;
border-bottom-color: #FFFFFF;
border-bottom-width: 0;
}

#tllrekyvog .gt_subtitle {
color: #333333;
font-size: 85%;
font-weight: initial;
padding-top: 3px;
padding-bottom: 5px;
padding-left: 5px;
padding-right: 5px;
border-top-color: #FFFFFF;
border-top-width: 0;
}

#tllrekyvog .gt_heading {
background-color: #FFFFFF;
text-align: center;
border-bottom-color: #FFFFFF;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
}

#tllrekyvog .gt_bottom_border {
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
}

#tllrekyvog .gt_col_headings {
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
}

#tllrekyvog .gt_col_heading {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: normal;
text-transform: inherit;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
vertical-align: bottom;
padding-top: 5px;
padding-bottom: 6px;
padding-left: 5px;
padding-right: 5px;
overflow-x: hidden;
}

#tllrekyvog .gt_column_spanner_outer {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: normal;
text-transform: inherit;
padding-top: 0;
padding-bottom: 0;
padding-left: 4px;
padding-right: 4px;
}

#tllrekyvog .gt_column_spanner_outer:first-child {
padding-left: 0;
}

#tllrekyvog .gt_column_spanner_outer:last-child {
padding-right: 0;
}

#tllrekyvog .gt_column_spanner {
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
vertical-align: bottom;
padding-top: 5px;
padding-bottom: 5px;
overflow-x: hidden;
display: inline-block;
width: 100%;
}

#tllrekyvog .gt_spanner_row {
border-bottom-style: hidden;
}

#tllrekyvog .gt_group_heading {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
text-transform: inherit;
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
vertical-align: middle;
text-align: left;
}

#tllrekyvog .gt_empty_group_heading {
padding: 0.5px;
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
vertical-align: middle;
}

#tllrekyvog .gt_from_md > :first-child {
margin-top: 0;
}

#tllrekyvog .gt_from_md > :last-child {
margin-bottom: 0;
}

#tllrekyvog .gt_row {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
margin: 10px;
border-top-style: solid;
border-top-width: 1px;
border-top-color: #D3D3D3;
border-left-style: none;
border-left-width: 1px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 1px;
border-right-color: #D3D3D3;
vertical-align: middle;
overflow-x: hidden;
}

#tllrekyvog .gt_stub {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
text-transform: inherit;
border-right-style: solid;
border-right-width: 2px;
border-right-color: #D3D3D3;
padding-left: 5px;
padding-right: 5px;
}

#tllrekyvog .gt_stub_row_group {
color: #333333;
background-color: #FFFFFF;
font-size: 100%;
font-weight: initial;
text-transform: inherit;
border-right-style: solid;
border-right-width: 2px;
border-right-color: #D3D3D3;
padding-left: 5px;
padding-right: 5px;
vertical-align: top;
}

#tllrekyvog .gt_row_group_first td {
border-top-width: 2px;
}

#tllrekyvog .gt_row_group_first th {
border-top-width: 2px;
}

#tllrekyvog .gt_summary_row {
color: #333333;
background-color: #FFFFFF;
text-transform: inherit;
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
}

#tllrekyvog .gt_first_summary_row {
border-top-style: solid;
border-top-color: #D3D3D3;
}

#tllrekyvog .gt_first_summary_row.thick {
border-top-width: 2px;
}

#tllrekyvog .gt_last_summary_row {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
}

#tllrekyvog .gt_grand_summary_row {
color: #333333;
background-color: #FFFFFF;
text-transform: inherit;
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
}

#tllrekyvog .gt_first_grand_summary_row {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
border-top-style: double;
border-top-width: 6px;
border-top-color: #D3D3D3;
}

#tllrekyvog .gt_last_grand_summary_row_top {
padding-top: 8px;
padding-bottom: 8px;
padding-left: 5px;
padding-right: 5px;
border-bottom-style: double;
border-bottom-width: 6px;
border-bottom-color: #D3D3D3;
}

#tllrekyvog .gt_striped {
background-color: rgba(128, 128, 128, 0.05);
}

#tllrekyvog .gt_table_body {
border-top-style: solid;
border-top-width: 2px;
border-top-color: #D3D3D3;
border-bottom-style: solid;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
}

#tllrekyvog .gt_footnotes {
color: #333333;
background-color: #FFFFFF;
border-bottom-style: none;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 2px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 2px;
border-right-color: #D3D3D3;
}

#tllrekyvog .gt_footnote {
margin: 0px;
font-size: 90%;
padding-top: 4px;
padding-bottom: 4px;
padding-left: 5px;
padding-right: 5px;
}

#tllrekyvog .gt_sourcenotes {
color: #333333;
background-color: #FFFFFF;
border-bottom-style: none;
border-bottom-width: 2px;
border-bottom-color: #D3D3D3;
border-left-style: none;
border-left-width: 2px;
border-left-color: #D3D3D3;
border-right-style: none;
border-right-width: 2px;
border-right-color: #D3D3D3;
}

#tllrekyvog .gt_sourcenote {
font-size: 90%;
padding-top: 4px;
padding-bottom: 4px;
padding-left: 5px;
padding-right: 5px;
}

#tllrekyvog .gt_left {
text-align: left;
}

#tllrekyvog .gt_center {
text-align: center;
}

#tllrekyvog .gt_right {
text-align: right;
font-variant-numeric: tabular-nums;
}

#tllrekyvog .gt_font_normal {
font-weight: normal;
}

#tllrekyvog .gt_font_bold {
font-weight: bold;
}

#tllrekyvog .gt_font_italic {
font-style: italic;
}

#tllrekyvog .gt_super {
font-size: 65%;
}

#tllrekyvog .gt_footnote_marks {
font-size: 75%;
vertical-align: 0.4em;
position: initial;
}

#tllrekyvog .gt_asterisk {
font-size: 100%;
vertical-align: 0;
}

#tllrekyvog .gt_indent_1 {
text-indent: 5px;
}

#tllrekyvog .gt_indent_2 {
text-indent: 10px;
}

#tllrekyvog .gt_indent_3 {
text-indent: 15px;
}

#tllrekyvog .gt_indent_4 {
text-indent: 20px;
}

#tllrekyvog .gt_indent_5 {
text-indent: 25px;
}

#tllrekyvog .katex-display {
display: inline-flex !important;
margin-bottom: 0.75em !important;
}

#tllrekyvog div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
height: 0px !important;
}

Characteristic N Overall
N = 200
1
Drug A
N = 98
1
Drug B
N = 102
1
Age [yrs] 189 47.2 (14.3) 47.0 (14.7) 47.4 (14.0)
    (Missing) 11 7 4
Marker Level [ng/mL] 190 0.9 (0.9) 1.0 (0.9) 0.8 (0.8)
    (Missing) 10 6 4
T Stage 200
    T1 53 (27%) 28 (29%) 25 (25%)
    T2 54 (27%) 25 (26%) 29 (28%)
    T3 43 (22%) 22 (22%) 21 (21%)
    T4 50 (25%) 23 (23%) 27 (26%)
Grade 200
    I 68 (34%) 35 (36%) 33 (32%)
    II 68 (34%) 32 (33%) 36 (35%)
    III 64 (32%) 31 (32%) 33 (32%)
Tumor Response 193 61 (32%) 28 (29%) 33 (34%)
    (Missing) 7 3 4
Patient Died 200 112 (56%) 52 (53%) 60 (59%)
Months to Death/Censor 200 19.6 (5.3) 20.2 (5.0) 19.0 (5.5)
1 Mean (SD); n (%)


Since the {gtsummary} packages contains functions to convert the {gtsummary} object to object types required by other popular table-specific R packages, e.g. as_kable_extra(), as_flextable() etc., {gtsummary} tables can be easily rendered as .docx, .pdf or .html. The following example shows how to print a {gtsummary} object using the {kableExtra} package.

library(kableExtra)
tbl.gts %>%
  as_kable_extra(
    caption = "Baseline characteristics",
    booktabs = TRUE,
    align = "lcccc",
  ) %>%
  kable_classic(full_width = FALSE)
Baseline characteristics
Characteristic N Overall
N = 200
Drug A
N = 98
Drug B
N = 102
Age [yrs] 189 47.2 (14.3) 47.0 (14.7) 47.4 (14.0)
(Missing) 11 7 4
Marker Level [ng/mL] 190 0.9 (0.9) 1.0 (0.9) 0.8 (0.8)
(Missing) 10 6 4
T Stage 200
T1 53 (27%) 28 (29%) 25 (25%)
T2 54 (27%) 25 (26%) 29 (28%)
T3 43 (22%) 22 (22%) 21 (21%)
T4 50 (25%) 23 (23%) 27 (26%)
Grade 200
I 68 (34%) 35 (36%) 33 (32%)
II 68 (34%) 32 (33%) 36 (35%)
III 64 (32%) 31 (32%) 33 (32%)
Tumor Response 193 61 (32%) 28 (29%) 33 (34%)
(Missing) 7 3 4
Patient Died 200 112 (56%) 52 (53%) 60 (59%)
Months to Death/Censor 200 19.6 (5.3) 20.2 (5.0) 19.0 (5.5)
1 Mean (SD); n (%)

Other packages

Other packages I have used in the past are {qwraps2} and {tableone}. However, the {gtsummary} package seems to offer the most convenient way to create a “table one”.

R Markdown: How to place two tables side by side using ‘knitr’ and ‘kableExtra’

Intro

When I was recently writing some report using R Markdown, I wanted to place two rather small tables side by side. Since I usually use the kable()-function of the knitr package and the kableExtra package to print tables, I tried to find a solution for my problem using both packages.

Since my Google search (“two tables side by side with kableExtra” or something similar) did not return a helpful result, I experimented with some table formating options provided by the kableExtra package. Here is my solution.

Packages and data

For printing the tables we need to install and load two packages: knitr and kableExtra. The dplyr packages is required for some data manipulation. The data we want to put into the tables stem from the bundesligR package which contains final tables of Germany's highest football (soccer) league. We want to place the final tables of two seasons (1985/86 and 2015/16) side by side.

df <- bundesligR::bundesligR 
table.1985 <- df %>%
  filter(Season == 1985) %>%
    select(Position, Team, Points)
table.2015 <- df %>%
  filter(Season == 2015) %>%
    select(Position, Team, Points)

Now, we place both tables side by side using some functionality of the kableExtra package:

table.1985 %>%
  kable("html", align = 'clc', caption = 'Bundesliga, Season 1985/86') %>%
    kable_styling(full_width = F, position = "float_left")

table.2015 %>%
  kable("html", align = 'clc', caption = 'Bundesliga, Season 2015/16') %>%
    kable_styling(full_width = F, position = "right")
Bundesliga, Season 1985/86
Position Team Points
1 FC Bayern Muenchen 70
2 Werder Bremen 69
3 FC Bayer 05 Uerdingen 64
4 Borussia Moenchengladbach 57
5 VfB Stuttgart 58
6 TSV Bayer 04 Leverkusen 55
7 Hamburger SV 56
8 SV Waldhof Mannheim 44
9 VfL Bochum 46
10 FC Schalke 04 41
11 1. FC Kaiserslautern 40
12 1. FC Nuernberg 41
13 1. FC Koeln 38
14 Fortuna Duesseldorf 40
15 Eintracht Frankfurt 35
16 Borussia Dortmund 38
17 1. FC Saarbruecken 27
18 Hannover 96 23
Bundesliga, Season 2015/16
Position Team Points
1 FC Bayern Muenchen 88
2 Borussia Dortmund 78
3 Bayer 04 Leverkusen 60
4 Borussia Moenchengladbach 55
5 FC Schalke 04 52
6 1. FSV Mainz 05 50
7 Hertha BSC 50
8 VfL Wolfsburg 45
9 1. FC Koeln 43
10 Hamburger SV 41
11 FC Ingolstadt 04 40
12 FC Augsburg 38
13 Werder Bremen 38
14 SV Darmstadt 98 38
15 TSG 1899 Hoffenheim 37
16 Eintracht Frankfurt 36
17 VfB Stuttgart 33
18 Hannover 96 25

The trick is to set the position argument to float_left (left table) and right (right table). Furthermore, the argument full_width must be set to FALSE in both tables.

To Do

Unfortunately, the given example only works for rendering HTML documents. Does anyone know how to place two tables side by side when the output format is PDF/LaTeX?

How to print tables with absolute and relative values in R

Introduction

In R, there are several ways to generate tables. while the table() function generates tables with absolute numbers, the prop.table() function returns tables with relative values (percentages). However, I couldn't find a function to return a table with both absolute and relative values.

In this blog post, I show how to generate such a table.

Generate a random dataframe

In the first code snippet, we generate a random dataframe with two variables: Sex and Age. As you can see, generating random dataframes is very easy and straightforward with Tyler Rinker's Wakefield package.

library(wakefield)
df <- r_data_frame(n=200, sex, age)

Write the function

My function tab.func() combines three R functions:

  • describe() from the Hmisc package to return an object of class describe containing absolute and relative frequency values of a factor variable. To access these values, we need to subset this object using $values. This will return a matrix with the desired values.

  • t() to transpose this matrix, and

  • as.table() to transform this matrix into a table.

tab.func <- function (x) {
  y <- as.table(t(Hmisc::describe(x)$values))
  colnames(y) <- c('**n**', '**%**')
  return(y)
}

(UPDATE: In version 4 of the Hmisc package, the describe() function was rewritten. My function only works up to version 3.17.4)

The double asterisks around n and % are Markdown code used to return bold text.

Deploy the function

In the following code snippet, we deploy this function to a categorial variable (Sex) which is part of the dataframe df.

mytable <- tab.func(df$Sex)


knitr::kable(mytable,
             caption = 'Table with absolute numbers and percentages')
n %
A Male, Female 96, 104

Finaly, we print this table with the kable() function from the knitr package.

Design a site like this with WordPress.com
Get started