Skip to content

Conversation

@lidavidm
Copy link
Member

Note these don't use pairwise summation and so may be prone to precision issues.

@github-actions
Copy link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be the same comment in GroupedSumImpl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should be updated for this operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pity this is duplicating the existing math from the scalar aggregate kernel.

How would you feel about factoring the underlying math in a simple VarStdOp<ArrowType> that you would feed values to? You would have one VarStdOp in the scalar aggregate kernel and num_groups_ ones in the hash aggregate kernel.

That might be a bit different performance-wise because you would have an array-of-structures std::vector<VarStdOp> rather than a structure-of-arrays of the current three BufferBuilder, but I'm not sure it's really important here.

OTOH, Consume would not really benefit because the scalar aggregate kernel uses pairwise-summation for floating-point input.

cc @bkietz for advice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also @cyb70289

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option that Ben's mentioned would be to treat scalar aggregation as a hash aggregation with one group, though then we should immediately tackle the pairwise summation issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can get a comparison up before I'm out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunately not very good. See #10813.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try refactoring it instead and compare the performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are unfortunately verbose, but perhaps you could add nevertheless another test with a different ddof (and one group with an insufficient number of non-null values)?

@lidavidm
Copy link
Member Author

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Jul 27, 2021

Benchmark runs are scheduled for baseline = 31b60f3 and contender = df99462. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 (mimalloc)
[Skipped ⚠️ Only ['Python', 'R', 'JavaScript'] langs are supported on ursa-i9-9960x] ursa-i9-9960x (mimalloc)
[Failed] ursa-thinkcentre-m75q (mimalloc)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

@pitrou
Copy link
Member

pitrou commented Jul 29, 2021

@ursabot please benchmark lang=C++

@ursabot
Copy link

ursabot commented Jul 29, 2021

Benchmark runs are scheduled for baseline = 31b60f3 and contender = 90c91fa. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Provided benchmark filters do not have any benchmark groups to be executed on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2 (mimalloc)
[Skipped ⚠️ Only ['Python', 'R', 'JavaScript'] langs are supported on ursa-i9-9960x] ursa-i9-9960x (mimalloc)
[Finished ⬇️0.67% ⬆️0.05%] ursa-thinkcentre-m75q (mimalloc)
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

@lidavidm
Copy link
Member Author

lidavidm commented Aug 2, 2021

From conbench it looks like the changes generally don't affect the performance of the existing variance/stddev kernels except for maybe the int32 variance case (likely a fluke?).

@pitrou
Copy link
Member

pitrou commented Aug 2, 2021

It's difficult to navigate the conbench results, but the results don't seem concerning in any way.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you very much @lidavidm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants