-
Notifications
You must be signed in to change notification settings - Fork 21
Fix empty bin confusing error #3239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/scipp/core/binning.py
Outdated
| def _total_number_of_events_in_binned(x: DataArray): | ||
| return (x.bins.constituents['end'] - x.bins.constituents['begin']).sum().value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work, since this contains all events, even those that are not in any bins in Sorry I misread. But you can just use x. The most common case where this occurs is after slicing. You should be able to write a unit test that still shows the initial error, even after this fix: da['y', 0].bin(x=4).x.bins.size().sum() to achieve correctly what you did by hand.
Instead of counting events, can you just use the min and max and check those? Do we get a similar error when there is just a single event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, I'll use x.bins.size().sum() instead.
Instead of counting events, can you just use the min and max and check those?
I don't understand, do you mean min and max of the coordinate?
With a single event we get a binning where every edge has the same value.
import scipp as sc
import numpy as np
rng = np.random.default_rng(123)
lati = rng.random(10)
# modify a single entry to be in the 'lati' bin with index 1
lati[-1] = 1.5
da = sc.DataArray(
data=sc.ones(sizes={"row": 10}, unit="counts"),
coords={
'longi': sc.array(dims=['row'], values=rng.random(10)),
'lati': sc.array(dims=['row'], values=lati),
}
)
empty_b = da.bin(lati=sc.array(dims=['lati'], values=[0.0, 1.2, 2.0]))['lati', 1]
empty_b.bin(longi=5)
# <scipp.DataArray>Dimensions: Sizes[longi:5, ] Coordinates: ... longi float64 [dimensionless] (longi [bin-edge]) [0.518165, 0.518165, ..., 0.518165, 0.518165] Data: ...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mean use what is computed below anyway, and check if the range is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha okay, I'll try to do that instead.
tests/binning_test.py
Outdated
| table.bin(label=sc.scalar(0.5, unit='m')) | ||
|
|
||
|
|
||
| def test_bin_empty(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_bin_empty(): | |
| def test_bin_with_automatic_bin_bounds_raises_if_no_events(): |
or something like that?
| table = sc.data.table_xyz(0) | ||
| with pytest.raises(ValueError): | ||
| table.bin(x=4) | ||
| table.bin(x=sc.linspace('x', 0, 1, 4, unit=table.coords['x'].unit)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this belongs in the same test?
src/scipp/core/binning.py
Outdated
| size_of_relevant_data_range = ( | ||
| max(map(_total_number_of_events_in_binned, x.values())) if isinstance(x, Dataset) else | ||
| _total_number_of_events_in_binned(x) if x.bins else | ||
| len(x.values) | ||
| ) | ||
| if size_of_relevant_data_range == 0: | ||
| raise ValueError('Empty data range, cannot automatically determine bounds. Must provide concrete bin edges.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like handling the start and stop below may be simpler?
3a92324 to
c237ce4
Compare
c237ce4 to
b4b7b57
Compare
| table.bin(x=4) | ||
|
|
||
|
|
||
| def test_bin_with_manual_bin_bounds_not_raises_if_no_events(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I wonder: If you give bad manual bounds (similar to the ones computed automatically with no events, e.g., not increasing), do we also get a weird error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we manually give bad edges like
table = sc.data.table_xyz(0)
table.bin(x=sc.array(dims=['x'], values=[3.0, 1.2, 2.0]))
# Raises: BinEdgeError: Bin edges in dim x must be sorted.the error message makes a lot of sense.
Was that what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good 👍
b4b7b57 to
a2027fd
Compare
This pr fixes #3209 by raising an error if
The data set can be either a
DataArrayor aDataset, this makes the test a bit more cumbersome.The binning should be fine if any of the
DataArraysin the data set contains events.I'm using
bins.constituentsto figure out the total number of events in a binnedDataArray, is there a better way?