Improve performance of case_when() and case_match()#6711
Improve performance of case_when() and case_match()#6711DavisVaughan merged 8 commits intotidyverse:mainfrom
case_when() and case_match()#6711Conversation
| .size <- vec_size_common(!!!conditions, !!!values, .size = .size, .call = error_call) | ||
| .size <- vec_size_common(!!!conditions, !!!values, .size = .size) | ||
|
|
||
| conditions <- vec_recycle_common(!!!conditions, .size = .size, .call = error_call) | ||
| values <- vec_recycle_common(!!!values, .size = .size, .call = error_call) |
There was a problem hiding this comment.
These were superfluous
| dots_env = dots_env, | ||
| error_call = error_call | ||
| ) | ||
| env_error_info <- new_environment() |
There was a problem hiding this comment.
An environment to track information used by with_case_errors()
| ! Can't recycle `..1 (LHS)` (size 2) to match `..1 (RHS)` (size 3). | ||
| Code | ||
| (expect_error(case_when(c(TRUE, FALSE) ~ 1, c(FALSE, TRUE, FALSE) ~ 2, c(FALSE, | ||
| TRUE, FALSE, NA) ~ 3))) | ||
| Output | ||
| <error/vctrs_error_incompatible_size> | ||
| Error in `case_when()`: | ||
| ! Can't recycle `c(TRUE, FALSE)` (size 2) to match `c(FALSE, TRUE, FALSE)` (size 3). | ||
| ! Can't recycle `..1 (LHS)` (size 2) to match `..2 (LHS)` (size 3). |
There was a problem hiding this comment.
Particularly important to show LHS vs RHS in these errors, otherwise it can be very confusing
| default_env = default_env, | ||
| dots_env = dots_env, | ||
| error_call = error_call | ||
| default_env = caller_env(), |
There was a problem hiding this comment.
It's interesting that this is expensive enough to care about.
There was a problem hiding this comment.
It is super minor honestly, but we've worked hard in vctrs to make env evaluation lazy and it seems a waste to throw that away
| } | ||
|
|
||
| with_case_errors <- function(expr, side, i, error_call) { | ||
| with_case_errors <- function(expr, error_call, env_error_info) { |
There was a problem hiding this comment.
Are you taking advantage of try_fetch() features here? Is it worth changing to a bare withCallingHandlers()?
There was a problem hiding this comment.
Good point, withCallingHandlers() is as fast as it gets. Probably still better to wrap from the outside though.
There was a problem hiding this comment.
Is it still worth having this as a separate function? It's kind of hard to follow now that the lexical envs are intertwined.
| names(lhs) <- paste0("..", seq_args, " (LHS)") | ||
| names(rhs) <- paste0("..", seq_args, " (RHS)") |
There was a problem hiding this comment.
Given that people don't seem to general understand LHS/RHS might be better to just use left/right.
| } | ||
|
|
||
| with_case_errors <- function(expr, side, i, error_call) { | ||
| with_case_errors <- function(expr, error_call, env_error_info) { |
There was a problem hiding this comment.
Good point, withCallingHandlers() is as fast as it gets. Probably still better to wrap from the outside though.
| } | ||
|
|
||
| with_case_errors <- function(expr, side, i, error_call) { | ||
| with_case_errors <- function(expr, error_call, env_error_info) { |
There was a problem hiding this comment.
Is it still worth having this as a separate function? It's kind of hard to follow now that the lexical envs are intertwined.
Rather than wrapping `eval_tidy()`, we now wrap the whole for loop, assuming that the only errors that can happen in the loop are related to expression evaluation. This avoids repeated calls to `try_fetch()`, which can be expensive due to `tryCatch()` being implemented in R rather than C. This is particularly important when there are many conditions, as the repeated `try_fetch()` calls have a noticeable performance impact
6b92b11 to
1d18013
Compare
Closes #6674
Three improvements to make these decently fast again:
Minor: Calls to
current_env()andcaller_env()are now lazy (only evaluated when needed) (needs Canas_function()avoid materializingcall? r-lib/rlang#1558)Major: We no longer call
as_label()to turn the expressions into labels which were being used as informative names that showed up in error messages. This stinks, butas_label()is just too slow to be used eagerly like this. We can revisit this again in the future if we add a lazy ALTREP character vector which could materialize theas_label()calls on demand (i.e. at error time). In the meantime, we now use positional names that also include LHS/RHS information, like..2 (RHS), which ends up being decent.Major: I was previously wrapping the evaluation of the lhs/rhs of the formulas in
with_case_errors()to give an informative indexed error about which formula failed to evaluate. This got slow as you increased the number of expressions incase_when()(2 * N expressions = number ofwith_case_errors()calls) because it usestry_fetch(), which is built ontryCatch(), which is in R not C (probably a good Luke project!). I've changed it up so 1 call towith_case_errors()now wraps the entire evaluation loop, and we track some index information to use in the error if one occurs.Sadly the private option added in rlang (r-lib/rlang@33db700) wasn't enough to make
as_label()fast enough to be used eagerly like this. But theinfix_overflows()call there is a big part of the performance degradation we see in my benchmarks below.When I work on #6678 that will end up also knocking off another 50us or so from switching away from
vec_assert()invec_case_when()andif_else().After that, any other improvements will come from moving
vec_case_when()to C in vctrs.Right now we are in the 2-4x slower range. After #6678 I expect to be 2x slower most of the time. I think we can do a release with that. Then the C implementation in vctrs should fix the rest.