Fix #741: Memory pickling was broken with mmap_mode != None#743
Fix #741: Memory pickling was broken with mmap_mode != None#743jnothman wants to merge 5 commits intojoblib:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #743 +/- ##
==========================================
+ Coverage 95.23% 95.26% +0.02%
==========================================
Files 42 42
Lines 6128 6144 +16
==========================================
+ Hits 5836 5853 +17
+ Misses 292 291 -1
Continue to review full report at Codecov.
|
| memory2 = pickle.loads(pickle.dumps(memory)) | ||
| assert memory.store_backend.location == memory2.store_backend.location | ||
|
|
||
| # Fix regression in #741 |
There was a problem hiding this comment.
Maybe use the issue full URL (https://github.com/joblib/joblib/issues/741) instead of just the id ?
|
Thanks @jnothman ! Apart the minor comment regarding issue URL in the test, LGTM. |
|
I actually think the test should be stronger and check attribute
equivalence.
|
|
It'd be great to have this merged and released |
+1 As far as I understood this issue makes CI fails on scikit-learn since joblib 0.12.1 was bundled (cf scikit-learn/scikit-learn#11771) |
| def __init__(self, location=None, backend='local', cachedir=None, | ||
| mmap_mode=None, compress=False, verbose=1, bytes_limit=None, | ||
| backend_options={}): | ||
| def __init__(self, location=None, backend='local', mmap_mode=None, |
There was a problem hiding this comment.
I think the problem is more general than this. I reckon the fix is to keep the arguments order as they were in 0.11 otherwise we will break Memory pickles between 0.11 and 0.12 with no particularly good reason.
scikit-learn 0.19 has a memory argument in Pipeline so this could affect pickled Pipelines between scikit-learn 0.19 and 0.20.
With joblib 0.11:
import pickle
from joblib import __version__
from joblib import Memory
mem = Memory(cachedir='/tmp/test')
print(pickle.dumps(mem))
# b'\x80\x03cjoblib.memory\nMemory\nq\x00(X\t\x00\x00\x00/tmp/testq\x01N\x89K\x01tq\x02Rq\x03.'Reloading in 0.12+ (even with the current fix from this PR):
import pickle
pickle.loads(b'\x80\x03cjoblib.memory\nMemory\nq\x00(X\t\x00\x00\x00/tmp/testq\x01N\x89K\x01tq\x02Rq\x03.')/home/lesteve/miniconda3/bin/ipython:2: UserWarning: Compressed results cannot be memmapped
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-926293d23d86> in <module>()
1 import pickle
----> 2 pickle.loads(b'\x80\x03cjoblib.memory\nMemory\nq\x00(X\t\x00\x00\x00/tmp/testq\x01N\x89K\x01tq\x02Rq\x03.')
~/dev/joblib/joblib/memory.py in __init__(self, location, backend, mmap_mode, compress, verbose, bytes_limit, backend_options, cachedir)
845 backend, location, verbose=self._verbose,
846 backend_options=dict(compress=compress, mmap_mode=mmap_mode,
--> 847 **backend_options))
848
849 @property
~/dev/joblib/joblib/memory.py in _store_backend_factory(backend, location, verbose, backend_options)
115 if obj is None:
116 raise TypeError('Unknown location {0} or backend {1}'.format(
--> 117 location, backend))
118
119 # The store backend is configured with the extra named parameters,
TypeError: Unknown location /tmp/test/joblib or backend None
There was a problem hiding this comment.
I reckon the fix is to keep the arguments order as they were in 0.11 otherwise we will break Memory pickles between 0.11 and 0.12 with no particularly good reason.
I guess actually you should keep location (rather than cachedir) as the first argument.
There was a problem hiding this comment.
We should probably change __reduce__ to use arguments via a dictionary rather than positionally to avoid this kind of hassles in the future ...
|
Note: this This definitely needs to be looked at in more details ... Before thinking at doing a joblib release we should double-check that the problematic scikit-learn examples pass on joblib master. For example even with the current fix, this fails (this may or may not be related to export SKLEARN_SITE_JOBLIB=1
export PYTHONPATH=~/dev/joblib # adapt this or install joblib master in your environment
ipython examples/cluster/plot_feature_agglomeration_vs_univariate_selection.pyFull stack-trace |
|
You're right that |
|
A passes the test here, but the cross-version compatibility of pickles is something that would need to be dealt with in This dict-based approach has the limitation that if there were to be another change of parameter name, it would be hard to handle cross-version compatibility, because restoring the object involves running Should we be excluding joblib 0.12 from scikit-learn 0.20RC? |
Not sure what the timeline for the RC is but these problems seem fixable. I'll try to take a look shortly. There were a lot of improvements in joblib that went in between 0.11 and 0.12 (Loky backend that will hopefully get rid of of multiprocessing freezes, work on dask distributed backend, store backend for Memory, etc ...) and we probably want it for scikit-learn 0.20. |
|
(Is the Loky backend now the default? Is it stable? Should it be mentioned
in the Scikit-learn what's new?)
|
Yes
There were some recent fixes for the stability problems that were found so far. I am definitely not the expert on this though.
Probably if this is not mentioned already. |
I have confirmed that this test fails at master.
Fix #741.