-
Notifications
You must be signed in to change notification settings - Fork 21
RFC: ADR-0019 / Change binned data unit and dtype properties to return elem unit and dtype #3645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
docs/development/adr/0019-binned-data-unit-dtype-python-properties.md
Outdated
Show resolved
Hide resolved
|
|
||
| - The API becomes more consistent and easier to use. | ||
| - Fewer bugs due to incorrect handling of binned data. | ||
| - Avoid confusing users with `DataArrayView` and `VariableView` dtype displayed, e.g., in `_repr_html_`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way at runtime to tell which bin type a variable has?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like isnstance(obj.values, sc.Variable) (etc.) is kind of equivalent (with rare exceptions). We should also add a property to obj.bins is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is isinstance(obj.values, sc.Variable) O(1)? Or does it have to build a new array of variables / data arrays / datasets objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is O(1), but you are right that it may have a non-trivial cost. But as I said, some other property can be added. Also, I cannot think of many places where it might be used, so performance might be irrelevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be. I just want to make sure that we don't completely remove this possiblity (without it requiring a large allocation).
|
I pushed an update with a working refactor, if someone would like to play with it. To do: Change |
| return None | ||
| _write_scipp_header(group, 'Variable') | ||
| dset = cls._write_data(group, var) | ||
| if var.bins is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand from the changes if we would now be able or not able to read old scipp .h5 files that contain binned data...
Can you help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should, that is why I kept writing the "old" dtype of the binned variable.
|
Updated based on discussion with @jl-wynen and added unit setter. Please have another look. |
| unit = var.bins.unit if var.bins is not None else var.unit | ||
| try: | ||
| unit = var.unit | ||
| except RuntimeError: # binned variable with Dataset content | ||
| unit = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this change is relatively pointless: This didn't work previously (Dataset has not property unit), but at least now we get a table of bin lengths...
|
The following packages passed tests without changes:
|
|
Also tested:
|
|
Final question: Are we all ok with making this breaking change without deprecation or other prior warnings? I feel it is ok since we kind of have proof that the impact is minimal, but please consider it for a minute. |
I agree that this in unlikely to break anyone's code. So we should be able to just make the change. |
|
It is still possible to access the unit of the underlying variable inside the I'm pretty sure no code (even user code) was checking for |
|
|
But is that when you are checking to see if the data is binned? |
Are you asking whether someone ever did |
|
Yes, that was what I was saying. That it was extremely inlikely that someone tried to do this to check if the data is binned. The changes here don't break things like |
Related: #3391.