Skip to content

Conversation

@dragosmg
Copy link
Contributor

@dragosmg dragosmg commented Feb 21, 2022

Once this PR is merge the {arrow} binding for month() will be able to accept integer inputs. This also impact the semester() which now will be able to extract semesters from integer inputs.

library(lubridate)
library(dplyr)
library(arrow)

df <- tibble(
  month_as_int = 1:12
)

df %>% 
  mutate(m = month(month_as_int),
         s = semester(month_as_int))
#> # A tibble: 12 × 3
#>    month_as_int     m     s
#>           <int> <int> <int>
#>  1            1     1     1
#>  2            2     2     1
#>  3            3     3     1
#>  4            4     4     1
#>  5            5     5     1
#>  6            6     6     1
#>  7            7     7     2
#>  8            8     8     2
#>  9            9     9     2
#> 10           10    10     2
#> 11           11    11     2
#> 12           12    12     2

df %>% 
  arrow_table() %>% 
  mutate(m = month(month_as_int),
         s = semester(month_as_int)) %>% 
  collect()
#> # A tibble: 12 × 3
#>    month_as_int     m     s
#>           <int> <int> <int>
#>  1            1     1     1
#>  2            2     2     1
#>  3            3     3     1
#>  4            4     4     1
#>  5            5     5     1
#>  6            6     6     1
#>  7            7     7     2
#>  8            8     8     2
#>  9            9     9     2
#> 10           10    10     2
#> 11           11    11     2
#> 12           12    12     2

Created on 2022-03-02 by the reprex package (v2.0.1)

@github-actions
Copy link

@dragosmg dragosmg force-pushed the lubridate_month_int_input branch from 2965c21 to c6daf89 Compare March 2, 2022 09:19
@dragosmg dragosmg marked this pull request as ready for review March 2, 2022 09:23
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cheeky implementation, but it works. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could actually short-circuit here and return the expression with simply if_else when label = FALSE, since that will return an integer itself, yeah?

I agree that this is a little bit unorthodox, but I think it works. An alternative would be to put this value into a date string with a call to the paste binding and then doing strptime on that. Both aren't super ideal — but I suspect that this integer multiplication is going to be more performant (though if we are basing a decision on that, we probably should measure it!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done any benchmarking yet.

@dragosmg dragosmg force-pushed the lubridate_month_int_input branch 4 times, most recently from 3273d94 to ce79b61 Compare March 4, 2022 13:06
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good — a few comments about the structure of the function body.

Also, there's at least one call to Expression$create() that probably could be build_expr(), what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could actually short-circuit here and return the expression with simply if_else when label = FALSE, since that will return an integer itself, yeah?

I agree that this is a little bit unorthodox, but I think it works. An alternative would be to put this value into a date string with a call to the paste binding and then doing strptime on that. Both aren't super ideal — but I suspect that this integer multiplication is going to be more performant (though if we are basing a decision on that, we probably should measure it!).

@dragosmg dragosmg requested a review from jonkeane March 7, 2022 14:23
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two very minor, more stylistic nit-picky comments that you can take or leave

@dragosmg dragosmg force-pushed the lubridate_month_int_input branch from 71e9f4e to 75e8b0a Compare March 8, 2022 18:18
@jonkeane jonkeane closed this in 1b77e6d Mar 9, 2022
@ursabot
Copy link

ursabot commented Mar 9, 2022

Benchmark runs are scheduled for baseline = a76794c and contender = 1b77e6d. 1b77e6d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.25% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.81% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@dragosmg dragosmg deleted the lubridate_month_int_input branch April 26, 2022 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants