Skip to content

Conversation

@SimonHeybrock
Copy link
Member

This replaces the to_events mechanism, which introduced random noise in the bin-edges, causing problems in stream processing. For $N\rightarrow\infty$ to_events should produce the same result as the rebin solution (module differences in how the bin size is chosen).

This also fixes the (I think) bug in to_events, which included counts from unphysical negative-size bins between frames.

Comment on lines +53 to +54
of the strictly increasing sections, with a step size equal to the difference
between the first two values of the section with the minimum start value (which is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a step size equal to the difference between the first two values of the section with the minimum start value

In the previous verison, we were taking like a mean bin size. I felt it's was little safer than just taking the first one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to minimize the amount of resampling performed. This implementation should effectively "do nothing" for one of the frames. Can we keep it like this and see if it works well in practice?

between the first two values of the section with the minimum start value (which is
not necessarily the first section).
"""
min_val, max_val = get_min_max(var, dim=dim, slices=slices)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we searching on the slices instead of just a nanmin/nanmax of var because the bounds of var might not belong to a section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, we want to get the range that the union of sections covers.

return sc.arange(
dim=dim,
start=min_val.value,
stop=max_val.value + step.value, # Ensure the last bin edge is included
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember struggling with arange and step before, with cases where if max_val was an exact multiple of step, then the max of the arange would be max_val.
E.g.
Screenshot_20250526_112505

When using this to histogram afterwards, the max_val would then be dropped (because we include the min value and exclude the max when using bin edges).

Can the same thing happen here? (or maybe rebin does not have the same issue as hist?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebin does have the same issue as hist. But this case is different: We did not compute a max of events. We computed a max of bin edges, so the resulting value should already be an epsilon above the last event (if there were events, which there are not).

Or did you ask about sth. else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it wasn't something different. Because it's max of bin edges, it's not an issue here. I think that's correct.

assert sections == [slice(0, 2), slice(2, 5), slice(5, 8)]

def test_given_flat_sections_finds_strictly_increasing_parts_only(self):
var = sc.array(dims=['x'], values=[1, 2, 2, 3, 4, 4, 5])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what happens with something like [1, 2, 2, 2, 3, 4, 4, 5]? (note the three 2s)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test now.

assert max_val.value == 50 # The max value is at the end of the second slice

def test_with_float_data(self):
var = sc.array(dims=['x'], values=[1.1, 2.2, 3.3, 4.4, 5.5])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also work with datetimes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect datetime input coords? Does tof support that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't expect datetimes. I was just wondering if it could be used as a general tool.
But let's not complicate things.

Copy link
Member

@nvaytet nvaytet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for getting this to work in a proper way 👍

@SimonHeybrock SimonHeybrock merged commit 70f9261 into main May 26, 2025
4 checks passed
@SimonHeybrock SimonHeybrock deleted the toa-tof-rebin branch May 26, 2025 10:22
@github-project-automation github-project-automation bot moved this from Selected to Done in Development Board May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants