New dodging algorithm for box plots#2196
Conversation
|
Can you include an example with continuous x? I think that's what I was thinking of for the for loop. |
|
It also seems like we need a little padding between the individual elements. Could we make that a parameter? |
|
Ohhh continuous x. Yeah you're right, this doesn't work well for that. library("ggplot2")
## Without varwidth = TRUE it's ok aside from the padding issue
ggplot(diamonds, aes(y = price)) +
geom_boxplot(aes(x = cut_width(carat, 0.5), fill = depth < 65))## With the changes from this PR and varwidth = TRUE
ggplot(diamonds, aes(y = price)) +
geom_boxplot(aes(x = cut_width(carat, 0.5), fill = depth < 65), varwidth = TRUE)Here's how the second plot looks with the current dev version of ggplot. ggplot(diamonds, aes(y = price)) +
geom_boxplot(aes(x = cut_width(carat, 0.5), fill = depth < 65), varwidth = TRUE)
#> Warning: position_dodge requires non-overlapping x intervalsfor loop it is, then! |
|
Oh, and there is a larger problem with this: each pair of boxes is getting scaled separately to the group width, which makes the varying widths from |
|
Note that the order of the bars in the second example of the OP is different from the rest (I've confirmed it with a checkout of your current position-dodge branch). |
|
Thanks for pointing that out, @mcol. Right now the boxes get placed in the order in which they appear in the data passed to |
|
Also, my example above doesn't really have a continuous x variable since set.seed(582)
dat <- data.frame(x = sample(4:5, size = 20, replace = TRUE),
y = rnorm(20),
class = sample(c("a", "b"), size = 20, replace = TRUE))
ggplot(dat, aes(x = x, y = y)) +
geom_boxplot(aes(group = interaction(x, class), fill = class), varwidth = TRUE) |
This is still wrong though
…be dodged from one another i.e. if there are 2 boxes per group all widths get divided by 2
|
Alright, here is the new box plot algorithm. Below are a bunch of examples of how it looks in practice with continuous and discrete x variables and various combinations of library("ggplot2")
## From the original example:
ggplot(data = iris, aes(Species, Sepal.Length)) +
geom_boxplot(aes(colour = Sepal.Width < 3.2))ggplot(data = iris, aes(Species, Sepal.Length)) +
geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = TRUE)## Diamonds data -- padding between boxes is less noticeable
ggplot(diamonds, aes(y = price)) +
geom_boxplot(aes(x = cut_width(carat, 0.5), fill = depth < 65), varwidth = TRUE)
## A plot with truly continuous x
set.seed(582)
dat <- data.frame(x = sample(4:5, size = 20, replace = TRUE),
y = rnorm(20),
class = sample(c("a", "b"), size = 20, replace = TRUE))
ggplot(dat, aes(x = x, y = y)) +
geom_boxplot(aes(group = interaction(x, class), fill = class))## Truly continuous x and varwidth = TRUE
ggplot(dat, aes(x = x, y = y)) +
geom_boxplot(aes(group = interaction(x, class), fill = class), varwidth = TRUE)## Preserve total width
ggplot(mtcars, aes(factor(cyl), y = mpg, fill = factor(vs))) +
geom_boxplot(position = position_boxdodge(preserve = "total"))## Preserve individual width with varwidth = TRUE
ggplot(mtcars, aes(factor(cyl), y = mpg, fill = factor(vs))) +
geom_boxplot(position = position_boxdodge(preserve = "single"), varwidth = TRUE)## preserve = "total" is incompatible with varwidth = TRUE
ggplot(mtcars, aes(factor(cyl), y = mpg, fill = factor(vs))) +
geom_boxplot(position = position_boxdodge(preserve = "total"), varwidth = TRUE)
#> Warning: Can't preserve total widths when varwidth = TRUE. |
R/position-boxdodge.r
Outdated
| @@ -0,0 +1,116 @@ | |||
| #' Position dodge for box plots | |||
There was a problem hiding this comment.
I'd like to give this a name that reflects that it works for any geom with variable widths - i.e. it would also work for geom_rect().
There was a problem hiding this comment.
Or maybe position_flexdodge(), since it's a more flexible dodge that can handle variable width boxes and arbitrary rectangles?
R/position-boxdodge.r
Outdated
| #' @export | ||
| #' @examples | ||
| #' ggplot(data = iris, aes(Species, Sepal.Length)) + | ||
| #' geom_boxplot(aes(colour = Sepal.Width < 3.2)) |
R/position-boxdodge.r
Outdated
| } | ||
|
|
||
| # xid represents groups of boxes that share the same position | ||
| df$xid <- match(df$x, sort(unique(df$x))) |
There was a problem hiding this comment.
Oh this is why it won't work with geom_rect(). It's only a few lines of code, so I think it's worth using the for-loop that I originally proposed
R/position-boxdodge.r
Outdated
| df$xmax <- df$x + (df$new_width / 2) | ||
|
|
||
| # Find the total width of each group of boxes | ||
| group_sizes <- plyr::ddply(df, "xid", plyr::summarize, size = sum(new_width)) |
There was a problem hiding this comment.
Could this be replaced by a tapply()? I'd prefer to not use plyr in new code, since one day I'd like to eliminate the dependency.
There was a problem hiding this comment.
Oh yeah sure. aggregate() might be even better since it'll return a data frame.
R/position-boxdodge.r
Outdated
| } | ||
|
|
||
| # x values get moved to between xmin and xmax | ||
| df$x <- rowMeans(df[, c("xmin", "xmax")]) |
There was a problem hiding this comment.
I think df$x <- (df$xmin + df$xmax) / 2 would be clearer
tests/testthat/test-geom-boxplot.R
Outdated
| overlaps <- vector(length = nrow(d) - 1) | ||
| for (i in 2:nrow(d)) { | ||
| if (d$xmin[i] < d$xmax[i - 1]) { | ||
| print(i) |
tests/testthat/test-geom-boxplot.R
Outdated
|
|
||
| overlaps <- vector(length = nrow(d) - 1) | ||
| for (i in 2:nrow(d)) { | ||
| if (d$xmin[i] < d$xmax[i - 1]) { |
There was a problem hiding this comment.
This for loop could become (with a small modification) a separate group_overlapping function, and you could use above.
|
The default padding looks slightly too big to my eyes — maybe try 0.05. It would also be handy to show a couple of examples using with other geoms (i.e. |
|
I did some more research and it'll take a bit more work before |
|
Ok. Let's make it the default for boxplots and bars and then take a look at the others later. |
|
I've made some slight changes to hopefully make this as easy to use for both boxes and bars as possible. Something has gone wrong again with the ordering of the boxes/bars compared to the legend, I'll track that down and fix it. ggplot(data = iris, aes(Species, Sepal.Length)) +
geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = TRUE)ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
geom_bar(position = "dodge2") |
|
This is looking great! (Although now padding of 0.05 looks too small. Maybe switch back to 0.10?) |
…order to match the legend order
|
Soooo close. I am rethinking the defaults that I changed above. Right now this adds padding between the boxes because ggplot(mtcars, aes(factor(cyl), y = mpg, fill = factor(vs))) +
geom_boxplot()But if someone wants to change the ggplot(mtcars, aes(factor(cyl), y = mpg, fill = factor(vs))) +
geom_boxplot(position = position_dodge2(preserve = "total"))which results in no padding (they'd need to also add |
|
Yeah, I think that's a good idea. I rather like the padding between bars too. |
hadley
left a comment
There was a problem hiding this comment.
Feel free to merge once you've made the news and doc tweaks.
NEWS.md
Outdated
| # ggplot2 2.2.1.9000 | ||
|
|
||
| * Box plot position is now controlled by `position_dodge2()` (@karawoo, | ||
| #2143). |
There was a problem hiding this comment.
And bar. And you should add a brief descrption.
R/position-dodge2.r
Outdated
| @@ -0,0 +1,154 @@ | |||
| #' Alternate method for dodging overlapping objects | |||
There was a problem hiding this comment.
Maybe it's worth documenting together with position_dodge()?
|
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |














Fixes #2143. This creates a new ggproto object for box plot positions,
PositionBoxdodge, and a functionpos_boxdodge()that can handle regular and variable-width boxes.collide()gets split into several functions.collide_setup()does the initial setup of the data and gets called by bothcollide()(which behaves the same as it has been) orcollide_box().@hadley this is slightly different than the algorithm we talked about because it doesn't use a for loop to find the non-overlapping groups. The
xvalue in the data already seems to capture that information, so I think we can avoid the loop, but maybe I'm missing something...With these changes, box plots look like this: