R makes it easy to combine different kinds of plots into one overall graph. This may be useful to visualize both basic measures of central tendency (median, quartiles etc.) and the distribution of a certain variable. Moreover, so called cut-off values can be added to the graph.
In this blog post, I show how to combine box and jitter plots using the ggplot2 package.
First of all, we need to install and load the R packages required for the following steps. Since we want to do the installation and loading using the pacman package, we need to check whether this package has been installed already. If not, it will be installed and loaded. If yes, it will just be loaded (line 1). Furthermore we need the R packages ggplot2 and Hmisc. This time, the p_load function checks whether these packages have been installed already and either installs and loads or just loads them (line 2).
if (!require("pacman")) install.packages("pacman")
pacman::p_load(ggplot2, Hmisc)
In a second step, we create three random variables (var.scale, var.group, var.cutoff) with n=300.
- var.scale is a numeric variable with a mean value of about 50 and a standard deviation of about 17.
- var.group is a factor variable comprising the groups male dnd female.
- var.cutoff was calculated based on var.scale using predefined cut-off values (0 – 40 == low, 41 –60 = medium, >60 == high).
var.scale <- round(rnorm(300, 50, 17))
var.group <- rbinom(300, 1, .5)
var.group <- factor(var.group,
levels = c(0:1),
labels = c("male", "female"))
var.cutoff <- ifelse(var.scale <= 40, 1,
ifelse(var.scale > 40 & var.scale <= 60, 2, 3))
var.cutoff <- factor(var.cutoff,
levels = c(3:1),
labels = c("high", "medium", "low"))
The describe() function of the Hmisc package returns some basic measures of central tendency.
Hmisc::describe(var.scale)
## var.scale ## n missing unique Info Mean .05 .10 .25 .50 ## 300 0 71 1 51.25 24.00 30.90 41.00 50.00 ## .75 .90 .95 ## 63.25 70.00 76.00 ## ## lowest : 8 10 14 16 17, highest: 85 97 100 102 104
Hmisc::describe(var.group)
## var.group ## n missing unique ## 300 0 2 ## ## male (141, 47%), female (159, 53%)
Hmisc::describe(var.cutoff)
## var.cutoff ## n missing unique ## 300 0 3 ## ## high (87, 29%), medium (141, 47%), low (72, 24%)
Since the ggplot2 package requires the variables to be in a data frame, we have to create a new data frame df comprising our predefined variables using the data.frame() function.
df <- data.frame(var.scale, var.cutoff, var.group)
Using the functions xlab(), ylab() and ggtitle(), axis labels and plot title will be defined.
Box plots will be created using the geom_boxplot() function, with width specifying the boxes' width :-).
Jitter plots will be created using the geom_jitter() function. In addition, specifications have been made for colour and position and size of the dots.
ggplot(df) +
xlab("Group") +
ylab("Scale") +
ggtitle("Combination of Box and Jitter Plot") +
geom_boxplot(aes(var.group, var.scale),
width=0.5) +
geom_jitter(aes(var.group, var.scale, colour = var.cutoff),
position = position_jitter(width = .15, height=-0.7),
size=2) +
scale_y_continuous(limits=c(0, 101),
breaks = seq(0, 110, 10)) +
scale_color_manual(name="Legend",
values=c("red", "blue3", "green3"))

Finally, we are going to format both Y-axis and legend using the functions scale_y_continuous() and scale_color_manual().





