-
Notifications
You must be signed in to change notification settings - Fork 21
Fix operations on transposed binned data; remove binned data operation overhead #3282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jl-wynen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me.
| const auto indices2d = | ||
| makeVariable<scipp::index_pair>(Dims{Dim::X, Dim::Y}, Shape{2, 2}, | ||
| Values{std::pair{0, 1}, std::pair{1, 2}, | ||
| std::pair{2, 4}, std::pair{4, 5}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add an empty bin?
| auto data_buffer = | ||
| variable::variableFactory().create(type, dims, unit, variances); | ||
| // If the buffer size is unchanged and input indices match output indices we | ||
| // can use a cheap and simply copy of the buffers coords and masks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // can use a cheap and simply copy of the buffers coords and masks. | |
| // can use a cheap and simple copy of the buffer's coords and masks. |
| return make_bins(copy(indices), dim, | ||
| variableFactory().create(type, dims, unit, variances)); | ||
| return make_bins_no_validate( | ||
| indices, dim, variableFactory().create(type, dims, unit, variances)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this now lead to sharing indices? Or is this function only called with a fresh set of indices that the user has no handle of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell it is always fresh, but please double check! The intention is that this does not lead to sharing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it is only called in one place and that passes in freshly created indices.
9addc8f to
5a56d47
Compare
Fixes #3277.
Reviewer: Please consider the logic very carefully, I think it is probably not enough to look at the diff. Please also verify that we can indeed skip the bin-index validation.
Add support for operations on slices of binned data, which raised previously.
Aside from these changes, I am avoiding overhead in the setup logic of binned variables (tracing down this overhead is what actually led me to discover the linked bug). For few bins this is not so relevant, but it becomes significant when we have many bins. This may also impact some benchmarks @YooSunYoung is running for data-streaming applications.
Examples:
For 50M bins and 100M events in
da, times reduce from around 2 seconds to 0.7 seconds, i.e., we removed 1.3 seconds overhead. For 10M bins I see around 250 ms gain, i.e., we can see this is roughly proportional to the total bin count.