Skip to content

Conversation

@jokasimr
Copy link
Contributor

@jokasimr jokasimr commented Sep 12, 2023

This pr fixes #3209 by raising an error if

  • the selected data contains no entries and
  • automatic binning was requested.

The data set can be either a DataArray or a Dataset, this makes the test a bit more cumbersome.
The binning should be fine if any of the DataArrays in the data set contains events.

I'm using bins.constituents to figure out the total number of events in a binned DataArray, is there a better way?

Comment on lines 205 to 206
def _total_number_of_events_in_binned(x: DataArray):
return (x.bins.constituents['end'] - x.bins.constituents['begin']).sum().value
Copy link
Member

@SimonHeybrock SimonHeybrock Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work, since this contains all events, even those that are not in any bins in x. The most common case where this occurs is after slicing. You should be able to write a unit test that still shows the initial error, even after this fix: da['y', 0].bin(x=4). Sorry I misread. But you can just use x.bins.size().sum() to achieve correctly what you did by hand.

Instead of counting events, can you just use the min and max and check those? Do we get a similar error when there is just a single event?

Copy link
Contributor Author

@jokasimr jokasimr Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, I'll use x.bins.size().sum() instead.

Instead of counting events, can you just use the min and max and check those?

I don't understand, do you mean min and max of the coordinate?

With a single event we get a binning where every edge has the same value.

import scipp as sc
import numpy as np
rng = np.random.default_rng(123)

lati = rng.random(10)
# modify a single entry to be in the 'lati' bin with index 1
lati[-1] = 1.5

da = sc.DataArray(
    data=sc.ones(sizes={"row": 10}, unit="counts"),
    coords={
        'longi': sc.array(dims=['row'], values=rng.random(10)),
        'lati': sc.array(dims=['row'], values=lati),
    }
)

empty_b = da.bin(lati=sc.array(dims=['lati'], values=[0.0, 1.2, 2.0]))['lati', 1]
empty_b.bin(longi=5)
# <scipp.DataArray>Dimensions: Sizes[longi:5, ] Coordinates: ... longi float64  [dimensionless]  (longi [bin-edge])  [0.518165, 0.518165, ..., 0.518165, 0.518165] Data: ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean use what is computed below anyway, and check if the range is valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha okay, I'll try to do that instead.

table.bin(label=sc.scalar(0.5, unit='m'))


def test_bin_empty():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_bin_empty():
def test_bin_with_automatic_bin_bounds_raises_if_no_events():

or something like that?

table = sc.data.table_xyz(0)
with pytest.raises(ValueError):
table.bin(x=4)
table.bin(x=sc.linspace('x', 0, 1, 4, unit=table.coords['x'].unit))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this belongs in the same test?

Comment on lines 214 to 220
size_of_relevant_data_range = (
max(map(_total_number_of_events_in_binned, x.values())) if isinstance(x, Dataset) else
_total_number_of_events_in_binned(x) if x.bins else
len(x.values)
)
if size_of_relevant_data_range == 0:
raise ValueError('Empty data range, cannot automatically determine bounds. Must provide concrete bin edges.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like handling the start and stop below may be simpler?

@jokasimr jokasimr force-pushed the fix-empty-bin-confusing-error branch from 3a92324 to c237ce4 Compare September 12, 2023 15:34
@jokasimr jokasimr force-pushed the fix-empty-bin-confusing-error branch from c237ce4 to b4b7b57 Compare September 13, 2023 08:08
table.bin(x=4)


def test_bin_with_manual_bin_bounds_not_raises_if_no_events():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I wonder: If you give bad manual bounds (similar to the ones computed automatically with no events, e.g., not increasing), do we also get a weird error?

Copy link
Contributor Author

@jokasimr jokasimr Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we manually give bad edges like

table = sc.data.table_xyz(0)
table.bin(x=sc.array(dims=['x'], values=[3.0, 1.2, 2.0]))
# Raises: BinEdgeError: Bin edges in dim x must be sorted.

the error message makes a lot of sense.
Was that what you meant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good 👍

@jokasimr jokasimr force-pushed the fix-empty-bin-confusing-error branch from b4b7b57 to a2027fd Compare September 14, 2023 08:13
@jokasimr jokasimr merged commit eab366d into main Sep 14, 2023
@jokasimr jokasimr deleted the fix-empty-bin-confusing-error branch September 14, 2023 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Confusing error message from binning empty data

4 participants