Skip to content

Nondeterministic bug with bytestring decoding #3991

@lamorton

Description

@lamorton

I have an HDF5 dataset with a scalar variable called 'name' that is actual a 0-D NumPy array with dtype '|S8'. (Not my choice, this is what I get from someone else...) Occasionally, the loading fails.

MCVE Code Sample

#Set up the file
import h5py
f = h5py.File("error_demo.h5",mode='w')
f.create_dataset('name',shape=(),dtype="|S8",data=np.array([b'f(Pt,TE)'],dtype='|S8'))
f.close()

#Produce the error -- you may need to adjust the number of times you run the loop
import xarray as xr
for i in range(10):
    xr.load_dataset("error_demo.h5")

Expected Output

<xarray.Dataset>
Dimensions: ()
Data variables:
name <U8 'f(Pt,TE)'

Problem Description

The resulting error message
Traceback (most recent call last):

  File "<ipython-input-3-b8e48f28a262>", line 1, in <module>
    mcout62 = xr.load_dataset("57062/mcout000011.h5",group=r"part/ions/dE(r,z,D)")

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 261, in load_dataset
    return ds.load()

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py", line 659, in load
    v.load()

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/variable.py", line 375, in load
    self._data = np.asarray(self._data)

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py", line 677, in __array__
    self._ensure_cached()

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py", line 674, in _ensure_cached
    self.array = NumpyIndexingAdapter(np.asarray(self.array))

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py", line 653, in __array__
    return np.asarray(self.array, dtype=dtype)

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py", line 557, in __array__
    return np.asarray(array[self.key], dtype=None)

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 73, in __getitem__
    key, self.shape, indexing.IndexingSupport.OUTER, self._getitem

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py", line 837, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)

  File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 85, in _getitem
    array = getitem(original_array, key)

  File "netCDF4/_netCDF4.pyx", line 4408, in netCDF4._netCDF4.Variable.__getitem__

  File "netCDF4/_netCDF4.pyx", line 5384, in netCDF4._netCDF4.Variable._get

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 9: invalid start byte

Versions

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.6 (default, Jan 8 2020, 13:42:34)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 19.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.7.3

xarray: 0.15.0
pandas: 1.0.1
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.11.0
distributed: 2.11.0
matplotlib: 3.1.3
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 46.0.0.post20200309
pip: 20.0.2
conda: 4.8.3
pytest: 5.3.5
IPython: 7.12.0
sphinx: 2.4.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions