Skip to content

Conversation

@stephhazlitt
Copy link
Contributor

@stephhazlitt stephhazlitt commented Nov 29, 2021

This is my first PR contributing (or an attempt to contribute) to {arrow}.

This PR:

• removes warn_types warning that factors are converted to strings, which is no longer true https://github.com/apache/arrow/blob/master/r/R/dplyr-functions.R#L911-L920
• updates the test by removing the warning https://github.com/apache/arrow/blob/master/r/tests/testthat/test-dplyr-funcs-conditional.R#L130 however it does not remove the mutate() in the test as suggested in the TODO, if removed the test fails?
• [UPDATE] test includes a reset of the levels of all factor columns to pass, since Arrow if_else() kernel does not preserve unused factor levels (ARROW-14649)

@github-actions
Copy link

@nealrichardson
Copy link
Member

if removed the test fails?

Fails how?

@stephhazlitt
Copy link
Contributor Author

With the mutate line removed (line 128)

e.g.

 compare_dplyr_binding(
    .input %>%
      mutate(
        y = if_else(int > 5, fct, factor("a"))
      ) %>%
      collect(),
    tbl
  )
── Failure (test-dplyr-funcs-conditional.R:119:3): if_else and ifelse ───────────────────────────
`object` (`actual`) not equal to `expected` (`expected`).

`levels(actual$y)[1:4]`:   "a"             "g" "h" "i"
`levels(expected$y)[1:7]`: "a" "b" "c" "d" "g" "h" "i"

  `actual$y[4:10]`: NA 1 NA 2 3 4 5
`expected$y[4:10]`: NA 1 NA 5 6 7 8
Backtrace:
 1. compare_dplyr_binding(...) test-dplyr-funcs-conditional.R:119:2
 2. expect_equal(via_table, expected, ...) helper-expectation.R:129:4
 3. testthat::expect_equal(...) helper-expectation.R:42:4```

@ianmcook
Copy link
Member

There is a similar test here, in which I solved the same failure by using:

transmute(across(where(is.factor), ~ factor(.x, levels = c(...))))

https://github.com/apache/arrow/pull/11272/files?authenticity_token=86nJx4XEwecmmySQatP4CD6%2BYi62xBNQWT4vtOE0iIzJT6wH0ZJM3imNvGEBkfiPii6v0VXSUmag%2B5%2F8ycn0CQ%3D%3D&file-filters%5B%5D=.R#diff-c63a873ac0b560d2ca7229ac1df2628df8396b0cc51fc247b71a2499c492fe3aR317-R320

I think you can use that approach here too!

@stephhazlitt
Copy link
Contributor Author

Thanks @ianmcook, I replaced the as.character mutate with your approach above and added a comment to reference ARROW-14649. And apologies, I should have added a co-author to that commit.

Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Ian Cook <ianmcook@gmail.com>
@ianmcook ianmcook closed this in b83e6b0 Nov 29, 2021
@ursabot
Copy link

ursabot commented Nov 29, 2021

Benchmark runs are scheduled for baseline = 4913352 and contender = b83e6b0. b83e6b0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] ursa-i9-9960x
[Finished ⬇️0.8% ⬆️0.13%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants