Skip to content

GH-34909: [C++] Avoid mean overflow on large integer inputs#37243

Merged
pitrou merged 2 commits intoapache:mainfrom
js8544:jinshang/aggregate_mean_refactor
Aug 30, 2023
Merged

GH-34909: [C++] Avoid mean overflow on large integer inputs#37243
pitrou merged 2 commits intoapache:mainfrom
js8544:jinshang/aggregate_mean_refactor

Conversation

@js8544
Copy link
Copy Markdown
Contributor

@js8544 js8544 commented Aug 18, 2023

Rationale for this change

The mean aggregate function would overflow if the input array's sum is larger than int64_max.

What changes are included in this PR?

Store intermediate sum in double instead of int64, so that it won't overflow.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@js8544 js8544 force-pushed the jinshang/aggregate_mean_refactor branch from d4a8f81 to fbd528c Compare August 18, 2023 10:53
@js8544 js8544 requested a review from felipecrv August 22, 2023 16:34
@Light-City
Copy link
Copy Markdown
Contributor

In the current implementation, the overflow problem is not considered

@Light-City
Copy link
Copy Markdown
Contributor

In the current implementation, the overflow problem is not considered

just like:

  func = std::make_shared<ScalarAggregateFunction>("sum", Arity::Unary(), sum_doc,
                                                   &default_scalar_aggregate_options);
  AddArrayScalarAggKernels(SumInit, {boolean()}, uint64(), func.get());
  AddAggKernel(KernelSignature::Make({Type::DECIMAL128}, FirstType), SumInit, func.get(),
               SimdLevel::NONE);
  AddAggKernel(KernelSignature::Make({Type::DECIMAL256}, FirstType), SumInit, func.get(),
               SimdLevel::NONE);
  AddArrayScalarAggKernels(SumInit, SignedIntTypes(), int64(), func.get());

sum(int64)-> output also is int64(), will cause overflow.

@js8544
Copy link
Copy Markdown
Contributor Author

js8544 commented Aug 24, 2023

What you said is correct. However, this PR is intended to deal with the mean function, not sum. I'll add a sum_checked function after this PR is merged. Also see #37090 (comment)

@js8544
Copy link
Copy Markdown
Contributor Author

js8544 commented Aug 29, 2023

@felipecrv @pitrou I think this PR is ready for review. Would you mind taking a look? Thanks!

Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment. Also, can you rebase/merge from main?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment above explaining that you're overriding the sum result type and why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and rebased from main

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 30, 2023
@js8544 js8544 force-pushed the jinshang/aggregate_mean_refactor branch from fbd528c to bcbab9c Compare August 30, 2023 14:52
@js8544 js8544 requested a review from pitrou August 30, 2023 14:54
@pitrou pitrou merged commit e0ee82c into apache:main Aug 30, 2023
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Aug 30, 2023
@pitrou
Copy link
Copy Markdown
Member

pitrou commented Aug 30, 2023

Thank you @js8544 !

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Aug 30, 2023
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit e0ee82c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…ache#37243)

### Rationale for this change

The `mean` aggregate function would overflow if the input array's sum is larger than int64_max.

### What changes are included in this PR?

Store intermediate sum in double instead of int64, so that it won't overflow.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.

* Closes: apache#34909

Authored-by: Jin Shang <shangjin1997@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ache#37243)

### Rationale for this change

The `mean` aggregate function would overflow if the input array's sum is larger than int64_max.

### What changes are included in this PR?

Store intermediate sum in double instead of int64, so that it won't overflow.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.

* Closes: apache#34909

Authored-by: Jin Shang <shangjin1997@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] mean overflows if numeric sum is larger than int64 max

3 participants