FIX pickle roundtrip for Memory and related classes.#746
FIX pickle roundtrip for Memory and related classes.#746ogrisel merged 2 commits intojoblib:masterfrom
Conversation
|
@aabadie if you are around and have some spare bandwidth, I would definitely be interested to have your opinion on this! I may have overlooked some important reasons about why |
Codecov Report
@@ Coverage Diff @@
## master #746 +/- ##
==========================================
- Coverage 95.4% 95.36% -0.05%
==========================================
Files 43 43
Lines 6160 6191 +31
==========================================
+ Hits 5877 5904 +27
- Misses 283 287 +4
Continue to review full report at Codecov.
|
Just to elaborate on this, let's take the example of
memory = Memory(location='/my/location')
pipe = Pipeline(memory=memory, ...)
func_cached = memory.cache(func)
func_cached(pipe)you don't want the hash of
memory = Memory(location='/my/location', verbose=10)
memory_copy = copy.deepcopy(memory)
assert memory_copy.verbose == 10 # this seems like a very reasonable expectationYou need |
jnothman
left a comment
There was a problem hiding this comment.
If a commit added an attribute, would your tests fail?
jnothman
left a comment
There was a problem hiding this comment.
Sorry, I misunderstood. Yes, they would.
jnothman
left a comment
There was a problem hiding this comment.
Yes, your tests do cover the addition of attributes.
I think this is more complex than it should be. I don't think we need to introduce _reconstruction_from_parameters_dict. I don't think we should change the stored attributes on the objects just to make pickling easier to manage. And I wonder if the tests are too complicated to read: what are the practical differences between this and my tests?
This also appears harder to maintain than my PR, in that any change to Memory or MemorizedFunc's __init__ args will result in a test failure. Yes, this is more explicit, but to what end?
joblib/memory.py
Outdated
| self.timestamp = time.time() | ||
| self.bytes_limit = bytes_limit | ||
| self.backend = backend | ||
| self.location = location |
There was a problem hiding this comment.
Why do you want to change the attributes?
joblib/memory.py
Outdated
| self.mmap_mode = mmap_mode | ||
| self.compress = compress | ||
| self.func = func | ||
| self.location = location |
There was a problem hiding this comment.
Why do you want to change the attributes?
I tried using
I have reduced attribute changes to a minimum. The only thing I have kept is
IIRC you were only testing |
e38c589 to
6c9e06f
Compare
|
This does not solve the problem of piclkle in 0.11 and load in 0.12 though: |
|
@lesteve restoring the diff --git a/joblib/memory.py b/joblib/memory.py
index 016554a..eac15b7 100644
--- a/joblib/memory.py
+++ b/joblib/memory.py
@@ -809,9 +809,9 @@ class Memory(Logger):
# Public interface
# ------------------------------------------------------------------------
- def __init__(self, location=None, backend='local', cachedir=None,
- mmap_mode=None, compress=False, verbose=1, bytes_limit=None,
- backend_options=None):
+ def __init__(self, location=None, mmap_mode=None, compress=False,
+ verbose=1, bytes_limit=None, backend='local',
+ backend_options=None, cachedir=None):
# XXX: Bad explanation of the None value of cachedir
Logger.__init__(self)
self._verbose = verboseand all the test still pass along with the scikit-learn example as explained in #743 (comment). |
There was a problem hiding this comment.
This looks good to me once the ordering of the init args has been restored to enable compat with 0.11 objects. Please also include the 0.11 loads case as a new test:
import pickle
memory_joblib011 = pickle.loads(b'\x80\x03cjoblib.memory\nMemory\nq\x00(X\t\x00\x00\x00/tmp/testq\x01N\x89K\x01tq\x02Rq\x03.')
compare(memory_joblib011, Memory(location='/dev/test'),
ignored_attrs='timestamp')while including the joblib 0.11 snippet that generated the pickle in a comment:
import pickle
from joblib import Memory
mem = Memory(cachedir='/tmp/test')
print(pickle.dumps(mem)) |
Thanks for providing a fix @lesteve. Using set_state looks more robust than reduce.
The changes here may impact related projects that already provide custom store backends, like joblib-s3 or joblib-hadoop. They will probably need an update (not verified yet) |
d4242ae to
168fca0
Compare
|
Rebased. |
|
Adding an entry in the changelog would be great. |
|
I added an entry in CHANGES.rst. Feel free to improve it! |
|
Ok merged. Thank you very much @lesteve! |
* tag '0.12.3': (23 commits) Release 0.12.3 Loky 2.2.1 (joblib#760) FIX: FileSystemStoreBackend string representation only returning location (path) (joblib#765) Add optional dependency on psutil MAINT remove brittle time based assertion in test (joblib#761) Fix a bug in nesting level computation with FallbackToBackend(SequentialBackend()) (joblib#759) Make docstring more consistent with project style Improved performance of call_and_shelve (joblib#757) Better test name Fix default context handling (joblib#754) cloudpickle 0.5.5 (joblib#756) Fixed error where filter args was consuming kwargs (joblib#750) FIX pickle roundtrip for Memory and related classes. (joblib#746) test that passes but timeout appveyor because of unclosed semaphore tracker (joblib#676) DOC: fine tune compressor example dataset size for ReadTheDocs (joblib#753) FIX: MemorizedResult not picklable (joblib#752) [MRG] Better message with py27 when lz4 is not installed (joblib#740) MNT remove mutable default value for backend_options parameter (joblib#748) MNT create test file for _store_backends module (joblib#749) DOC consistently use memory rather than mem in memory.rst (joblib#744) ...
* releases: (23 commits) Release 0.12.3 Loky 2.2.1 (joblib#760) FIX: FileSystemStoreBackend string representation only returning location (path) (joblib#765) Add optional dependency on psutil MAINT remove brittle time based assertion in test (joblib#761) Fix a bug in nesting level computation with FallbackToBackend(SequentialBackend()) (joblib#759) Make docstring more consistent with project style Improved performance of call_and_shelve (joblib#757) Better test name Fix default context handling (joblib#754) cloudpickle 0.5.5 (joblib#756) Fixed error where filter args was consuming kwargs (joblib#750) FIX pickle roundtrip for Memory and related classes. (joblib#746) test that passes but timeout appveyor because of unclosed semaphore tracker (joblib#676) DOC: fine tune compressor example dataset size for ReadTheDocs (joblib#753) FIX: MemorizedResult not picklable (joblib#752) [MRG] Better message with py27 when lz4 is not installed (joblib#740) MNT remove mutable default value for backend_options parameter (joblib#748) MNT create test file for _store_backends module (joblib#749) DOC consistently use memory rather than mem in memory.rst (joblib#744) ...
Fix #741. Close #743.
A few things:
Memorypickle being broken between joblib 0.11 and 0.12 mentioned in Fix #741: Memory pickling was broken with mmap_mode != None #743 (comment). This potentially affects also scikit-learn 0.19 to 0.20 if you pickled estimators that use Memory (e.g. Pipeline, AgglomerativeClustering). This could be argued that we don't guarantee pickles compatibility between scikit-learn versions anyway. The error is quite misleading but at least this is an error and not a silent change ...__reduce__. This PR is leaning towards cloning quite heavily but maybe a bit too much ...