Skip to content

Conversation

@nealrichardson
Copy link
Member

@nealrichardson nealrichardson commented Feb 5, 2021

Includes dependency toolchain updates (utf8proc, re2) and bindings for

  • "nchar", "str_length" -> "binary_length"
  • "tolower", "str_to_lower" -> "utf8_lower"
  • "toupper", "str_to_upper" -> "utf8_upper"
  • "str_trim" is handled a little differently since it's one function in stringr but three functions in arrow (utf8_ltrim_whitespace, utf8_rtrim_whitespace, utf8_trim_whitespace)

@github-actions
Copy link

github-actions bot commented Feb 5, 2021

@nealrichardson
Copy link
Member Author

@github-actions crossbow submit homebrew-r-autobrew

@github-actions
Copy link

github-actions bot commented Feb 5, 2021

Revision: 66d81b82423cca02bf7c96a417de0b6613db9e99

Submitted crossbow builds: ursacomputing/crossbow @ actions-79

Task Status
homebrew-r-autobrew Github Actions

@nealrichardson nealrichardson marked this pull request as ready for review February 5, 2021 21:54
@nealrichardson
Copy link
Member Author

Aside from possibly adding more tests here (arguably redundant since the functions themselves are tested in C++, but perhaps worth more assurance that they behave as an R user would expect), this is done. Other TODOs have been made into followup JIRAs.

Comment on lines 266 to 271
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would pass without the changes in this PR. Do you want to try to test toupper() and tolower() in a way that would have failed before but succeeds now?

Copy link
Member Author

@nealrichardson nealrichardson Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky with the R string functions like this that start with

    if (!is.character(x)) 
        x <- as.character(x)

because that will work by coercing the Array data silently.

Maybe we can test that by defining as.character <- function(...) stop("🙅") in the test?

Copy link
Member Author

@nealrichardson nealrichardson Feb 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this. You can't just define as.character in the test because you need to replace the one that toupper calls. You can't use testthat::with_mock to mock base functions anymore, and my other hack of using trace to insert a crash seems to crash too many things (presumably legitimate places where as.character gets called elsewhere in the dplyr code). mockery might allow this more targeted but I haven't had great success setting it up before; there are other ways we could instrument our code to make sure the Arrow function is called, but I'm not sure it's worth it. It definitely won't work if you called toupper on a Dataset expression, so we get some related test coverage there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did work out a test for this, PTAL

@nealrichardson
Copy link
Member Author

I'm going to merge this now. We're going to be iterating on this code for a while so I don't feel compelled to make everything perfect and thorough here.

@nealrichardson nealrichardson deleted the r-strings branch February 16, 2021 21:30
collect(),
tbl
),
NA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a clever solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants