Handle scale_factor and add_offset as scalar#4485
Conversation
The h5netcdf engine exposes single-valued attributes as arrays of shape (1,), which is correct according to the NetCDF standard, but may cause a problem when reading a value of shape () before the scale_factor and add_offset have been applied. This PR adds a check for the dimensionality of add_offset and scale_factor and ensures they are scalar before they are used for further processing, adds a unit test to verify that this works correctly, and a note to the documentation to warn users of this difference between the h5netcdf and netcdf4 engines. Fixes pydata#4471.
|
Is this bugfix notable enough to need a For the unit test, I tried to construct an object that would emulate what is produced when reading a NetCDF4 file with the h5netcdf engine, but I gave up and settled for a temporary file instead. If this is an undesired approach, I could use some guidance in how to construct the appropriate object that will expose the problem. |
dcherian
left a comment
There was a problem hiding this comment.
@gerritholl I think a whats-new entry would be appropriate.
| There may be minor differences in the :py:class:`Dataset` object returned | ||
| when reading a NetCDF file with different engines. For example, | ||
| single-valued attributes are returned as scalars by the default | ||
| ``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with | ||
| ``engine=h5netcdf``. |
There was a problem hiding this comment.
I'm not 100% sure we need to mention this, I think it sort of goes without saying that different backends may differ in minor ways.
There was a problem hiding this comment.
As a user who understands much less deeply how backends interact with xarray, it did surprise me. Should I keep this note or remove it?
There was a problem hiding this comment.
It makes total sense but I wasn't aware either, so I'd leave it.
Add a whats-new entry for the fix to issue pydata#4471, corresponding to PR pydata#4485.
|
If this makes more sense as an integration test than as a unit test (for which I need help, see other comment), should I mark the current test in some way and/or move it to a different source file? |
| There may be minor differences in the :py:class:`Dataset` object returned | ||
| when reading a NetCDF file with different engines. For example, | ||
| single-valued attributes are returned as scalars by the default | ||
| ``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with | ||
| ``engine=h5netcdf``. |
There was a problem hiding this comment.
It makes total sense but I wasn't aware either, so I'd leave it.
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
|
Thanks @gerritholl |
The h5netcdf engine exposes single-valued attributes as arrays of shape
(1,), which is correct according to the NetCDF standard, but may cause
a problem when reading a value of shape () before the scale_factor and
add_offset have been applied. This PR adds a check for the dimensionality
of add_offset and scale_factor and ensures they are scalar before they
are used for further processing, adds a unit test to verify that this
works correctly, and a note to the documentation to warn users of this
difference between the h5netcdf and netcdf4 engines.
isort . && black . && mypy . && flake8whats-new.rst