Skip to content

Conversation

@dragosmg
Copy link
Contributor

@dragosmg dragosmg commented Feb 15, 2022

The following will be supported in arrow (base::as.Date() included to show the difference to date(), but not supported):

library(dplyr)
library(lubridate)

df <- tibble(a = as.POSIXct("2012-03-26 23:12:13", tz = "America/New_York"))

df %>%
  mutate(
    a_date = date(a),
    a_date_base = as.Date(a))
#> # A tibble: 1 × 3
#>   a                   a_date     a_date_base
#>   <dttm>              <date>     <date>     
#> 1 2012-03-26 23:12:13 2012-03-26 2012-03-27
library(arrow)
library(dplyr)
library(lubridate)

df <- tibble(a = as.POSIXct("2012-03-26 23:12:13", tz = "America/New_York"))
df %>% 
  arrow_table() %>% 
  mutate(
    a_date = date(a)
  ) %>% 
  collect()
#> # A tibble: 1 × 2
#>   a                   a_date    
#>   <dttm>              <date>    
#> 1 2012-03-26 23:12:13 2012-03-26

Created on 2022-02-15 by the reprex package (v2.0.1)

@github-actions
Copy link

@dragosmg dragosmg marked this pull request as draft February 15, 2022 17:26
@dragosmg dragosmg marked this pull request as ready for review February 15, 2022 17:30
@dragosmg dragosmg marked this pull request as draft February 15, 2022 17:45
@dragosmg dragosmg marked this pull request as ready for review February 15, 2022 19:36
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good — a few questions for you

@dragosmg dragosmg requested a review from jonkeane February 22, 2022 09:17
@dragosmg dragosmg force-pushed the lubridate_date branch 2 times, most recently from 57ce8c9 to 2d72983 Compare February 23, 2022 14:27
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment + suggestion.

Do we have a ticket to do as.Date separately? Could we do it with this one since it is very very similar?

@dragosmg dragosmg requested a review from jonkeane February 24, 2022 12:45
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

@dragosmg dragosmg requested a review from jonkeane February 25, 2022 15:39
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is making progress! I didn't quite realize that as.Date() would be as much of a can of worms as it's been — but that's ok, I think it is useful additional functionality!

In the as.Date() implementation it might be nice to restructure the flow so that you have one call at the end to build_expr("cast", intermediate, options = cast_options(to_type = date32())) after having built up intermediate getting things into the right format and then do the cast-to-date (I put intermediate here but that's probably not a good name for it, maybe "just" x like we do elsewhere — I trust you!).

@dragosmg
Copy link
Contributor Author

dragosmg commented Feb 28, 2022

I have created an umbrella issue (ARROW-15805) for the possible improvements to the as.Date() binding + ARROW-15800 to implement as_date().

@dragosmg dragosmg requested a review from jonkeane February 28, 2022 17:43
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, a few more questions. I didn't realize just how much extra scope as.Date() would add — though I think this is good functionality to add, so that's ok, but don't feel dispirited this is taking longer!

@dragosmg dragosmg requested a review from jonkeane March 1, 2022 13:27
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still getting there, most of the code changes I proposed are attempts to avoid some of the nesting in if/elses. Though I also do think we can support as.Date() with any timezone which will simplify + remove some of the tests down below catching that error.

@dragosmg dragosmg requested a review from jonkeane March 2, 2022 12:10
@dragosmg
Copy link
Contributor Author

dragosmg commented Mar 2, 2022

Thanks @jonkeane. Would you mind having another look?

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic, thank you for all of the pushes. This is incredibly close — one minor comment about an additional comment and one suggestion to use compare_dplyr_binding for testing the warning flow "not supported in arrow -> pulling into R". It's functionally the same, but follows the pattern we use elsewhere.

I'm happy to merge even without these, but these two would add a bit of polish + fit

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the CI is green, I'll merge.

Thank you again for all of the work you did on this!

@dragosmg
Copy link
Contributor Author

dragosmg commented Mar 3, 2022

Thanks for your patience and support 😉

@jonkeane jonkeane closed this in 9719eae Mar 3, 2022
@ursabot
Copy link

ursabot commented Mar 3, 2022

Benchmark runs are scheduled for baseline = ce46c1a and contender = 9719eae. 9719eae is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.13% ⬆️0.0%] test-mac-arm
[Failed ⬇️3.21% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.26% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@dragosmg dragosmg deleted the lubridate_date branch March 3, 2022 18:15
jonkeane pushed a commit that referenced this pull request Mar 8, 2022
`build_expr()` handles the conversion of the inputs to `Array` or `Scalar`.

#12433 (comment)

Closes #12563 from dragosmg/build_expr_in_as_methods

Authored-by: Dragoș Moldovan-Grünfeld <dragos.mold@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants