GH-34909: [C++] Avoid mean overflow on large integer inputs#37243
GH-34909: [C++] Avoid mean overflow on large integer inputs#37243pitrou merged 2 commits intoapache:mainfrom
Conversation
d4a8f81 to
fbd528c
Compare
|
In the current implementation, the overflow problem is not considered |
just like: sum(int64)-> output also is int64(), will cause overflow. |
|
What you said is correct. However, this PR is intended to deal with the |
|
@felipecrv @pitrou I think this PR is ready for review. Would you mind taking a look? Thanks! |
pitrou
left a comment
There was a problem hiding this comment.
One more comment. Also, can you rebase/merge from main?
There was a problem hiding this comment.
Can you add a comment above explaining that you're overriding the sum result type and why?
There was a problem hiding this comment.
Done and rebased from main
fbd528c to
bcbab9c
Compare
|
Thank you @js8544 ! |
|
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit e0ee82c. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ache#37243) ### Rationale for this change The `mean` aggregate function would overflow if the input array's sum is larger than int64_max. ### What changes are included in this PR? Store intermediate sum in double instead of int64, so that it won't overflow. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#34909 Authored-by: Jin Shang <shangjin1997@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ache#37243) ### Rationale for this change The `mean` aggregate function would overflow if the input array's sum is larger than int64_max. ### What changes are included in this PR? Store intermediate sum in double instead of int64, so that it won't overflow. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#34909 Authored-by: Jin Shang <shangjin1997@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
The
meanaggregate function would overflow if the input array's sum is larger than int64_max.What changes are included in this PR?
Store intermediate sum in double instead of int64, so that it won't overflow.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.