[MRG] simplify loading of compressed files#400
Conversation
92548f0 to
5b566f9
Compare
joblib/numpy_pickle_utils.py
Outdated
| if PY3_OR_LATER and not fileobj.seekable(): | ||
| # With Python 3 file object have to be able to seek explicitely. | ||
| raise ValueError("Input fileobject must be seekable.") | ||
| compressor = _detect_compressor(fileobj) |
There was a problem hiding this comment.
I am +1 for moving the _detect_compressor here, but I am not sure what the added value is of requiring that .seekable() return True.
What kind of error do we get with a file that is not seekable?
There was a problem hiding this comment.
Indeed I don't understand why we check this here and how it can possibly work in Python 2.
There was a problem hiding this comment.
Indeed I don't understand why we check this here and how it can possibly work in Python 2.
seekable may not exist or return False (as it was the case for BinaryZlibFile before the PR just merged) and yet .seek be completely functional.
An example for Python 2.7:
f = open('/tmp/bla', 'w')
f.seekable() # AttributeError: 'file' object has no attribute 'seekable'
f.seek(3) # works fine
joblib/numpy_pickle_utils.py
Outdated
| # the very beginning of the file. | ||
| if PY3_OR_LATER and not fileobj.seekable(): | ||
| # With Python 3 file object have to be able to seek explicitely. | ||
| raise ValueError("Input fileobject must be seekable.") |
There was a problem hiding this comment.
Please include the string representation of the fileobject in the message to make it more informative:
raise ValueError("Cannot load from non-seekable fileobject %r" % fileobj)
joblib/test/test_numpy_pickle.py
Outdated
| np.testing.assert_array_equal(array_reloaded, test_array) | ||
|
|
||
|
|
||
| def test_pickling_in_non_seekable_fileobject_raise_error(): |
There was a problem hiding this comment.
More accurate test name: test_loading_from_non_seekable_fileobject_raise_error
joblib/numpy_pickle_utils.py
Outdated
| if PY3_OR_LATER and not fileobj.seekable(): | ||
| # With Python 3 file object have to be able to seek explicitely. | ||
| raise ValueError("Input fileobject must be seekable.") | ||
| compressor = _detect_compressor(fileobj) |
There was a problem hiding this comment.
Indeed I don't understand why we check this here and how it can possibly work in Python 2.
5b566f9 to
0a261fb
Compare
|
After some discussion IRL, I came out to another refactoring of the compression detection when loading pickles:
It doesn't change the actual behavior, just make the code cleaner. |
joblib/numpy_pickle_utils.py
Outdated
| # objects (GzipFile, BZ2File, LzmaFile), we simply return it. | ||
| if isinstance(fileobj, tuple(_COMPRESSOR_CLASSES)): | ||
| yield fileobj | ||
| pass |
There was a problem hiding this comment.
Actually I think we should remove this case now. This would allow for nested compression no?
ogrisel
left a comment
There was a problem hiding this comment.
This looks great. The logic is much more straightforward to read and understand.
This is a small optimization. Maybe we could check if the passed fileobj is seekable before detecting the compressor and raise an error if it's not the case.
Related to #398.