-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10070: [C++][Compute] Implement var and std aggregate kernel #8269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Dev / Lint ci failure looks not related. missing license header? |
Yes, #8273 should fix that. |
|
We might want to have an option to specify the denominator? (whether it is |
Thank you, will do. |
|
Added option |
|
Two notes:
|
|
Thanks @nealrichardson
Naming is always the hardest thing :) Looks
I also thought about the |
|
Added variance kernel |
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rather "stddev" and "variance" respectively. "std" and "var" can easily be misunderstood, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ironically, the fact that this is random identically-distributed data means that the slices will have similar means and variances. Perhaps making the array smaller would make the test more significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test with chunks of very different sizes? For example [1, 2, 3, 4, 5, 6, 7] and [8].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test with ddof = 1? Though with such a large array size, the change may be rather small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming suggestion: how about DdofOptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we will add options later, e.g., to ignore NaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there might be other options in the future. But with the renaming of the functions, maybe call it VarianceOptions instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of struggling as stddev kernel uses this same option.
Maybe export both VarianceOptions and StddevOptions and alias them as same type internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one is useful. Just VarianceOptions IMHO.
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you @cyb70289
|
I suppose the behaviour with NaN is that any NaN in the input gives NaN as result? That might be worth adding a test for? |
|
Since we're not doing anything special, it certainly will. I think that can be tackled in a separate JIRA, when adding a nan handling option perhaps. |
No description provided.