Skip to content

Conversation

@lidavidm
Copy link
Member

This works by filtering the values (trading off time for memory usage).

@github-actions
Copy link

@lidavidm
Copy link
Member Author

I'm still digging into this, but I noticed the unit tests are especially slow in debug mode. This is because we make a 2 GiB size allocation in GrouperFastImpl::GetUniques, which also makes me worry that these tests might use too much memory for CI.

@lidavidm
Copy link
Member Author

lidavidm commented Aug 26, 2021

Also, filtering is not the right tradeoff to make: I think we don't care too much about the ONLY_NULL case and in the other cases, counting nulls is not that big a deal. So I'll update this to do the filtering in Finalize().

@lidavidm
Copy link
Member Author

Changed to filter the results after consumption. For whatever reason, not filtering before feeding batches into the grouper doesn't cause the runtime/memory blowup.

@lidavidm lidavidm marked this pull request as ready for review August 26, 2021 18:10
@ianmcook ianmcook requested a review from pitrou August 27, 2021 14:14
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. A couple comments, but looks mostly good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values->null_bitmap() may be null here, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a test case where none of the values is null?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add that the setting doesn't apply to the grouping keys?

@pitrou
Copy link
Member

pitrou commented Aug 30, 2021

This is because we make a 2 GiB size allocation in GrouperFastImpl::GetUniques

Really? Why does this happen?

@lidavidm
Copy link
Member Author

This is because we make a 2 GiB size allocation in GrouperFastImpl::GetUniques

Really? Why does this happen?

I was trying to figure out why, but didn't get very far in untangling the internals before I decided that filtering before grouping was pointless anyways. This doesn't happen the current implementation, I might poke at this further later though.

@lidavidm
Copy link
Member Author

Thanks for the suggestions. I believe I've addressed everything.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants