Skip to content

Conversation

@SimonHeybrock
Copy link
Member

Related: #3391.

@SimonHeybrock SimonHeybrock changed the title Add ADR-0019: Change binned data unit and dtype properties to return elem unit and dtype RFC: ADR-0019 / Change binned data unit and dtype properties to return elem unit and dtype Feb 4, 2025

- The API becomes more consistent and easier to use.
- Fewer bugs due to incorrect handling of binned data.
- Avoid confusing users with `DataArrayView` and `VariableView` dtype displayed, e.g., in `_repr_html_`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way at runtime to tell which bin type a variable has?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like isnstance(obj.values, sc.Variable) (etc.) is kind of equivalent (with rare exceptions). We should also add a property to obj.bins is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is isinstance(obj.values, sc.Variable) O(1)? Or does it have to build a new array of variables / data arrays / datasets objects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is O(1), but you are right that it may have a non-trivial cost. But as I said, some other property can be added. Also, I cannot think of many places where it might be used, so performance might be irrelevant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be. I just want to make sure that we don't completely remove this possiblity (without it requiring a large allocation).

@SimonHeybrock
Copy link
Member Author

I pushed an update with a working refactor, if someone would like to play with it.

To do: Change unit setter as well.

return None
_write_scipp_header(group, 'Variable')
dset = cls._write_data(group, var)
if var.bins is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand from the changes if we would now be able or not able to read old scipp .h5 files that contain binned data...
Can you help?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should, that is why I kept writing the "old" dtype of the binned variable.

@SimonHeybrock
Copy link
Member Author

Updated based on discussion with @jl-wynen and added unit setter. Please have another look.

Comment on lines -36 to +39
unit = var.bins.unit if var.bins is not None else var.unit
try:
unit = var.unit
except RuntimeError: # binned variable with Dataset content
unit = None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this change is relatively pointless: This didn't work previously (Dataset has not property unit), but at least now we get a table of bin lengths...

@SimonHeybrock
Copy link
Member Author

SimonHeybrock commented Feb 5, 2025

The following packages passed tests without changes:

  • scippneutron
  • scippnexus
  • plopp
  • essreduce
  • esssans

@jl-wynen
Copy link
Member

jl-wynen commented Feb 5, 2025

Also tested:

  • essspectroscopy

@SimonHeybrock
Copy link
Member Author

Final question: Are we all ok with making this breaking change without deprecation or other prior warnings? I feel it is ok since we kind of have proof that the impact is minimal, but please consider it for a minute.

@jl-wynen
Copy link
Member

jl-wynen commented Feb 7, 2025

Final question: Are we all ok with making this breaking change without deprecation or other prior warnings? I feel it is ok since we kind of have proof that the impact is minimal, but please consider it for a minute.

I agree that this in unlikely to break anyone's code. So we should be able to just make the change.

@nvaytet
Copy link
Member

nvaytet commented Feb 17, 2025

It is still possible to access the unit of the underlying variable inside the .bins (we have not removed that).
The only thing that changed is that the top level binned_data.coords['tof'].unit is not longer None.

I'm pretty sure no code (even user code) was checking for unit is None anywhere (the check is usually da.bins is not None), so I also think it's safe to do the change?

@SimonHeybrock
Copy link
Member Author

I'm pretty sure no code (even user code) was checking for unit is None anywhere (the check is usually da.bins is not None), so I also think it's safe to do the change?

unit is (not) None is checked in quite a few places, at least internally, not sure what you have in mind?

@nvaytet
Copy link
Member

nvaytet commented Feb 17, 2025

But is that when you are checking to see if the data is binned?

@SimonHeybrock
Copy link
Member Author

But is that when you are checking to see if the data is binned?

Are you asking whether someone ever did unit is None to check if the data is binned? Then the answer is "no", because we have unit is None also in many other places, for example for integer IDs, booleans, etc.

@nvaytet
Copy link
Member

nvaytet commented Feb 17, 2025

Yes, that was what I was saying. That it was extremely inlikely that someone tried to do this to check if the data is binned.
And that this would be the only case that could go wrong with the proposed changes.
It should therefore be safe to go ahead with the change.

The changes here don't break things like elem_unit, they just remove the need for it.

@SimonHeybrock SimonHeybrock merged commit c7a3833 into main Feb 17, 2025
4 checks passed
@SimonHeybrock SimonHeybrock deleted the adr-0019 branch February 17, 2025 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants