-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13528: [R] Bindings for mean, var, sd aggregation #10996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ow into ARROW_13528_summarise_mean
r/R/dplyr-functions.R
Outdated
| list( | ||
| fun = "stddev", | ||
| data = x, | ||
| options = list(ddof = 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've manually set ddof here, similarly to how we've manually set na.min_count without exposing this to the end user, but actually, would it make sense to add it as an argument with a default value of 1? I could see ddof (i.e. delta degrees of freedom) maybe being useful in some statistical contexts.
What governs whether we just mimic existing R behaviour vs. expose Arrow's additional functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scalar functions, you can always call the arrow_ version of the functions if you want arrow behaviors, but that won't work here. Maybe add it to the signature of agg_funcs$sd so it can be passed in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thanks
nealrichardson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, generally LGTM
r/R/dplyr-functions.R
Outdated
| list( | ||
| fun = "stddev", | ||
| data = x, | ||
| options = list(ddof = 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scalar functions, you can always call the arrow_ version of the functions if you want arrow behaviors, but that won't work here. Maybe add it to the signature of agg_funcs$sd so it can be passed in?
r/R/dplyr-functions.R
Outdated
| options = list(na.rm = na.rm, na.min_count = 0L) | ||
| ) | ||
| } | ||
| agg_funcs$sd <- function(x, na.rm = FALSE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like you're passing in na.rm to the options. If that's because of ARROW-13691, leave a comment here about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - updated now!
No description provided.