Reset file pointer to 0 when reading file stream#7304
Conversation
Instead of raising a ValueError about the file pointer not being at the start of the file, reset the file pointer automatically to zero, and warn that the pointer has been reset.
|
Traceback from the 2 test failures at https://github.com/pydata/xarray/actions/runs/3516849430/jobs/5893926099#step:9:252 =================================== FAILURES ===================================
____________________ TestH5NetCDFFileObject.test_open_twice ____________________
[gw2] linux -- Python 3.10.7 /home/runner/micromamba-root/envs/xarray-tests/bin/python
self = <xarray.tests.test_backends.TestH5NetCDFFileObject object at 0x7f211de81e40>
def test_open_twice(self) -> None:
expected = create_test_data()
expected.attrs["foo"] = "bar"
> with pytest.raises(ValueError, match=r"read/write pointer not at the start"):
E Failed: DID NOT RAISE <class 'ValueError'>
/home/runner/work/xarray/xarray/xarray/tests/test_backends.py:3034: Failed
___________________ TestH5NetCDFFileObject.test_open_fileobj ___________________
[gw2] linux -- Python 3.10.7 /home/runner/micromamba-root/envs/xarray-tests/bin/python
self = <xarray.tests.test_backends.TestH5NetCDFFileObject object at 0x7f211de82530>
@requires_scipy
def test_open_fileobj(self) -> None:
# open in-memory datasets instead of local file paths
expected = create_test_data().drop_vars("dim3")
expected.attrs["foo"] = "bar"
with create_tmp_file() as tmp_file:
expected.to_netcdf(tmp_file, engine="h5netcdf")
with open(tmp_file, "rb") as f:
with open_dataset(f, engine="h5netcdf") as actual:
assert_identical(expected, actual)
f.seek(0)
with open_dataset(f) as actual:
assert_identical(expected, actual)
f.seek(0)
with BytesIO(f.read()) as bio:
with open_dataset(bio, engine="h5netcdf") as actual:
assert_identical(expected, actual)
f.seek(0)
with pytest.raises(TypeError, match="not a valid NetCDF 3"):
open_dataset(f, engine="scipy")
# TODO: this additional open is required since scipy seems to close the file
# when it fails on the TypeError (though didn't when we used
# `raises_regex`?). Ref https://github.com/pydata/xarray/pull/5191
with open(tmp_file, "rb") as f:
f.seek(8)
with pytest.raises(
ValueError,
match="match in any of xarray's currently installed IO",
):
> with pytest.warns(
RuntimeWarning,
match=re.escape("'h5netcdf' fails while guessing"),
):
E Failed: DID NOT WARN. No warnings of type (<class 'RuntimeWarning'>,) matching the regex were emitted.
E Regex: 'h5netcdf'\ fails\ while\ guessing
E Emitted warnings: [ UserWarning('cannot guess the engine, file-like object read/write pointer not at the start of the file, so resetting file pointer to zero. If this does not work, please close and reopen, or use a context manager'),
E RuntimeWarning("deallocating CachingFileManager(<class 'h5netcdf.core.File'>, <_io.BufferedReader name='/tmp/tmpoxdfl12i/temp-720.nc'>, mode='r', kwargs={'invalid_netcdf': None, 'decode_vlen_strings': True}, manager_id='b62ec6c8-b328-409c-bc5d-bbab265bea51'), but file is not already closed. This may indicate a bug.")]
/home/runner/work/xarray/xarray/xarray/tests/test_backends.py:3076: Failed
=========================== short test summary info ============================
FAILED xarray/tests/test_backends.py::TestH5NetCDFFileObject::test_open_twice - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_backends.py::TestH5NetCDFFileObject::test_open_fileobj - Failed: DID NOT WARN. No warnings of type (<class 'RuntimeWarning'>,) matching the regex were emitted.
Regex: 'h5netcdf'\ fails\ while\ guessing
Emitted warnings: [ UserWarning('cannot guess the engine, file-like object read/write pointer not at the start of the file, so resetting file pointer to zero. If this does not work, please close and reopen, or use a context manager'),
RuntimeWarning("deallocating CachingFileManager(<class 'h5netcdf.core.File'>, <_io.BufferedReader name='/tmp/tmpoxdfl12i/temp-720.nc'>, mode='r', kwargs={'invalid_netcdf': None, 'decode_vlen_strings': True}, manager_id='b62ec6c8-b328-409c-bc5d-bbab265bea51'), but file is not already closed. This may indicate a bug.")]
= 2 failed, 14608 passed, 1190 skipped, 203 xfailed, 73 xpassed, 54 warnings in 581.98s (0:09:41) = |
Fixes the `Failed: DID NOT RAISE <class 'ValueError'>`
The ValueError and RuntimeWarning isn't raised anymore.
xarray/core/utils.py
Outdated
| warnings.warn( | ||
| "cannot guess the engine, " | ||
| "file-like object read/write pointer not at the start of the file, " | ||
| "so resetting file pointer to zero. If this does not work, " | ||
| "please close and reopen, or use a context manager" | ||
| ) |
There was a problem hiding this comment.
Is a warning actually necessary here, or can we remove the warning and let the file pointer reset to zero silently?
Edit: warning has been removed in 929cb62
|
@martindurant do you have time to take a look here please? |
|
It loos reasonable to me. I'm not sure if the warning is needed or not - we don't expect anyone to see it, or if they do, necessarily do anything about it. It's not unusual for code interacting with a file-like object to move the file pointer. |
Hmm, in that case, I'm leaning towards removing the warning. The file pointer is reset anyway after reading the magic byte number, and that hasn't caused any issues (as mentioned in #6813 (comment)), so it should be more or less safe. Let me push another commit. Edit: done at 929cb62. |
File pointer is reset to zero after reading the magic byte number anyway, so should be ok not to warn about this.
|
Thanks @weiji14 ! |
* upstream/main: (39 commits) Support the new compression argument in netCDF4 > 1.6.0 (pydata#6981) Remove setuptools-scm-git-archive, require setuptools-scm>=7 (pydata#7253) Fix mypy failures (pydata#7343) Docs: add example of writing and reading groups to netcdf (pydata#7338) Reset file pointer to 0 when reading file stream (pydata#7304) Enable mypy warn unused ignores (pydata#7335) Optimize some copying (pydata#7209) Add parse_dims func (pydata#7051) Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` (pydata#7229) Remove code used to support h5py<2.10.0 (pydata#7334) [pre-commit.ci] pre-commit autoupdate (pydata#7330) Fix PR number in what’s new (pydata#7331) Enable `origin` and `offset` arguments in `resample` (pydata#7284) fix doctests: supress urllib3 warning (pydata#7326) fix flake8 config (pydata#7321) implement Zarr v3 spec support (pydata#6475) Fix polyval overloads (pydata#7315) deprecate pynio backend (pydata#7301) mypy - Remove some ignored packages and modules (pydata#7319) Switch to T_DataArray in .coords (pydata#7285) ...
Bump xarray from 2022.11.0 to 2022.12.0 which contains a [bugfix](pydata/xarray#7304) useful for reading multiple groups from ICESat-2 HDF5 files in AWS S3 buckets. Also taking the opportunity to sort packages alphabetically in the `environment.yml` and reorganize some sections originally added in #2. Hopefully this will make it clearer on where new conda packages can be added in the future!
Instead of raising a ValueError about the file pointer not being at the start of the file, reset the file pointer automatically to zero, and warn that the pointer has been reset.
whats-new.rstapi.rst