Skip to content

Fix #741: Memory pickling was broken with mmap_mode != None#743

Closed
jnothman wants to merge 5 commits intojoblib:masterfrom
jnothman:cachedir-pos
Closed

Fix #741: Memory pickling was broken with mmap_mode != None#743
jnothman wants to merge 5 commits intojoblib:masterfrom
jnothman:cachedir-pos

Conversation

@jnothman
Copy link
Copy Markdown
Contributor

@jnothman jnothman commented Aug 9, 2018

I have confirmed that this test fails at master.

Fix #741.

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 9, 2018

Codecov Report

Merging #743 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #743      +/-   ##
==========================================
+ Coverage   95.23%   95.26%   +0.02%     
==========================================
  Files          42       42              
  Lines        6128     6144      +16     
==========================================
+ Hits         5836     5853      +17     
+ Misses        292      291       -1
Impacted Files Coverage Δ
joblib/_store_backends.py 90.57% <100%> (+0.09%) ⬆️
joblib/memory.py 95.34% <100%> (-0.03%) ⬇️
joblib/test/test_memory.py 97.56% <100%> (+0.06%) ⬆️
joblib/_parallel_backends.py 97.2% <0%> (+0.4%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec4f1ca...ca6ddc3. Read the comment docs.

memory2 = pickle.loads(pickle.dumps(memory))
assert memory.store_backend.location == memory2.store_backend.location

# Fix regression in #741
Copy link
Copy Markdown
Contributor

@aabadie aabadie Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the issue full URL (https://github.com/joblib/joblib/issues/741) instead of just the id ?

@aabadie
Copy link
Copy Markdown
Contributor

aabadie commented Aug 9, 2018

Thanks @jnothman ! Apart the minor comment regarding issue URL in the test, LGTM.

@jnothman
Copy link
Copy Markdown
Contributor Author

jnothman commented Aug 9, 2018 via email

@jnothman
Copy link
Copy Markdown
Contributor Author

It'd be great to have this merged and released

@rth
Copy link
Copy Markdown
Contributor

rth commented Aug 13, 2018

It'd be great to have this merged and released

+1 As far as I understood this issue makes CI fails on scikit-learn since joblib 0.12.1 was bundled (cf scikit-learn/scikit-learn#11771)

def __init__(self, location=None, backend='local', cachedir=None,
mmap_mode=None, compress=False, verbose=1, bytes_limit=None,
backend_options={}):
def __init__(self, location=None, backend='local', mmap_mode=None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is more general than this. I reckon the fix is to keep the arguments order as they were in 0.11 otherwise we will break Memory pickles between 0.11 and 0.12 with no particularly good reason.

scikit-learn 0.19 has a memory argument in Pipeline so this could affect pickled Pipelines between scikit-learn 0.19 and 0.20.

With joblib 0.11:

import pickle

from joblib import __version__
from joblib import Memory

mem = Memory(cachedir='/tmp/test')

print(pickle.dumps(mem)) 
# b'\x80\x03cjoblib.memory\nMemory\nq\x00(X\t\x00\x00\x00/tmp/testq\x01N\x89K\x01tq\x02Rq\x03.'

Reloading in 0.12+ (even with the current fix from this PR):

import pickle
pickle.loads(b'\x80\x03cjoblib.memory\nMemory\nq\x00(X\t\x00\x00\x00/tmp/testq\x01N\x89K\x01tq\x02Rq\x03.')
/home/lesteve/miniconda3/bin/ipython:2: UserWarning: Compressed results cannot be memmapped
  
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-926293d23d86> in <module>()
      1 import pickle
----> 2 pickle.loads(b'\x80\x03cjoblib.memory\nMemory\nq\x00(X\t\x00\x00\x00/tmp/testq\x01N\x89K\x01tq\x02Rq\x03.')

~/dev/joblib/joblib/memory.py in __init__(self, location, backend, mmap_mode, compress, verbose, bytes_limit, backend_options, cachedir)
    845             backend, location, verbose=self._verbose,
    846             backend_options=dict(compress=compress, mmap_mode=mmap_mode,
--> 847                                  **backend_options))
    848 
    849     @property

~/dev/joblib/joblib/memory.py in _store_backend_factory(backend, location, verbose, backend_options)
    115         if obj is None:
    116             raise TypeError('Unknown location {0} or backend {1}'.format(
--> 117                             location, backend))
    118 
    119         # The store backend is configured with the extra named parameters,

TypeError: Unknown location /tmp/test/joblib or backend None

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon the fix is to keep the arguments order as they were in 0.11 otherwise we will break Memory pickles between 0.11 and 0.12 with no particularly good reason.

I guess actually you should keep location (rather than cachedir) as the first argument.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably change __reduce__ to use arguments via a dictionary rather than positionally to avoid this kind of hassles in the future ...

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Aug 13, 2018

Note: this __reduce__ problem (inconsistency between __init__ and __reduce__) seems to be in other places, e.g. in MemorizedFunc. I think we need to look at other similar changes following the store backend addition.

This definitely needs to be looked at in more details ...

Before thinking at doing a joblib release we should double-check that the problematic scikit-learn examples pass on joblib master. For example even with the current fix, this fails (this may or may not be related to __reduce__/__init__ inconsistency in MemorizedFunc I haven't tracked this one fully yet):

export SKLEARN_SITE_JOBLIB=1
export PYTHONPATH=~/dev/joblib # adapt this or install joblib master in your environment
ipython examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py
Full stack-trace
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/dev/scikit-learn/examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py in <module>()
     84 # Select the optimal percentage of features with grid search
     85 clf = GridSearchCV(clf, {'anova__percentile': [5, 10, 20]}, cv=cv)
---> 86 clf.fit(X, y)  # set the best parameters
     87 coef_ = clf.best_estimator_.steps[-1][1].coef_
     88 coef_ = clf.best_estimator_.steps[0][1].inverse_transform(coef_.reshape(1, -1))

~/dev/scikit-learn/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    720                 return results_container[0]
    721 
--> 722             self._run_search(evaluate_candidates)
    723 
    724         results = results_container[0]

~/dev/scikit-learn/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
   1174     def _run_search(self, evaluate_candidates):
   1175         """Search all candidates in param_grid"""
-> 1176         evaluate_candidates(ParameterGrid(self.param_grid))
   1177 
   1178 

~/dev/scikit-learn/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
    709                                for parameters, (train, test)
    710                                in product(candidate_params,
--> 711                                           cv.split(X, y, groups)))
    712 
    713                 all_candidate_params.extend(candidate_params)

~/dev/joblib/joblib/parallel.py in __call__(self, iterable)
    979             # remaining jobs.
    980             self._iterating = False
--> 981             if self.dispatch_one_batch(iterator):
    982                 self._iterating = self._original_iterator is not None
    983 

~/dev/joblib/joblib/parallel.py in dispatch_one_batch(self, iterator)
    821                 return False
    822             else:
--> 823                 self._dispatch(tasks)
    824                 return True
    825 

~/dev/joblib/joblib/parallel.py in _dispatch(self, batch)
    778         with self._lock:
    779             job_idx = len(self._jobs)
--> 780             job = self._backend.apply_async(batch, callback=cb)
    781             # A job can complete so quickly than its callback is
    782             # called before we get here, causing self._jobs to

~/dev/joblib/joblib/_parallel_backends.py in apply_async(self, func, callback)
    181     def apply_async(self, func, callback=None):
    182         """Schedule a func to be run"""
--> 183         result = ImmediateResult(func)
    184         if callback:
    185             callback(result)

~/dev/joblib/joblib/_parallel_backends.py in __init__(self, batch)
    541         # Don't delay the application, to avoid keeping the input
    542         # arguments in memory
--> 543         self.results = batch()
    544 
    545     def get(self):

~/dev/joblib/joblib/parallel.py in __call__(self)
    259         with parallel_backend(self._backend):
    260             return [func(*args, **kwargs)
--> 261                     for func, args, kwargs in self.items]
    262 
    263     def __len__(self):

~/dev/joblib/joblib/parallel.py in <listcomp>(.0)
    259         with parallel_backend(self._backend):
    260             return [func(*args, **kwargs)
--> 261                     for func, args, kwargs in self.items]
    262 
    263     def __len__(self):

~/dev/scikit-learn/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, error_score)             
    522             estimator.fit(X_train, **fit_params)
    523         else:
--> 524             estimator.fit(X_train, y_train, **fit_params)
    525 
    526     except Exception as e:

~/dev/scikit-learn/sklearn/pipeline.py in fit(self, X, y, **fit_params)
    263             This estimator
    264         """
--> 265         Xt, fit_params = self._fit(X, y, **fit_params)
    266         if self._final_estimator is not None:
    267             self._final_estimator.fit(Xt, y, **fit_params)

~/dev/scikit-learn/sklearn/pipeline.py in _fit(self, X, y, **fit_params)
    228                 Xt, fitted_transformer = fit_transform_one_cached(
    229                     cloned_transformer, Xt, y, None,
--> 230                     **fit_params_steps[name])
    231                 # Replace the transformer of the step with the fitted                                                        
    232                 # transformer. This is necessary when loading the transformer

~/dev/joblib/joblib/memory.py in __call__(self, *args, **kwargs)
    320 
    321     def __call__(self, *args, **kwargs):
--> 322         return self.func(*args, **kwargs)
    323 
    324     def call_and_shelve(self, *args, **kwargs):

~/dev/scikit-learn/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, **fit_params)
    612 def _fit_transform_one(transformer, X, y, weight, **fit_params):
    613     if hasattr(transformer, 'fit_transform'):
--> 614         res = transformer.fit_transform(X, y, **fit_params)
    615     else:
    616         res = transformer.fit(X, y, **fit_params).transform(X)

~/dev/scikit-learn/sklearn/base.py in fit_transform(self, X, y, **fit_params)
    460         else:
    461             # fit method of arity 2 (supervised transformation)
--> 462             return self.fit(X, y, **fit_params).transform(X)
    463 
    464 

~/dev/scikit-learn/sklearn/feature_selection/univariate_selection.py in fit(self, X, y)
    347 
    348         self._check_params(X, y)
--> 349         score_func_ret = self.score_func(X, y)
    350         if isinstance(score_func_ret, (list, tuple)):
    351             self.scores_, self.pvalues_ = score_func_ret

~/dev/joblib/joblib/memory.py in __call__(self, *args, **kwargs)
    509 
    510     def __call__(self, *args, **kwargs):
--> 511         return self._cached_call(args, kwargs)[0]
    512 
    513     def __reduce__(self):

~/dev/joblib/joblib/memory.py in _cached_call(self, args, kwargs)
    440         # Compare the function code with the previous to see if the
    441         # function code has changed
--> 442         func_id, args_id = self._get_output_identifiers(*args, **kwargs)
    443         metadata = None
    444         msg = None

~/dev/joblib/joblib/memory.py in _get_output_identifiers(self, *args, **kwargs)
    530         """Return the func identifier and input parameter hash of a result."""
    531         func_id = _build_func_identifier(self.func)
--> 532         argument_hash = self._get_argument_hash(*args, **kwargs)
    533         return func_id, argument_hash
    534 

~/dev/joblib/joblib/memory.py in _get_argument_hash(self, *args, **kwargs)
    524 
    525     def _get_argument_hash(self, *args, **kwargs):
--> 526         return hashing.hash(filter_args(self.func, self.ignore, args, kwargs),
    527                             coerce_mmap=(self.mmap_mode is not None))
    528 

~/dev/joblib/joblib/func_inspect.py in filter_args(func, ignore_lst, args, kwargs)
    306 
    307     # Now remove the arguments to be ignored
--> 308     for item in ignore_lst:
    309         if item in arg_dict:
    310             arg_dict.pop(item)

TypeError: 'bool' object is not iterable

@jnothman
Copy link
Copy Markdown
Contributor Author

You're right that __reduce__ using kwargs is probably more robust. I'm not sure yet what's happening in the pickle-in-0.11-restore-in-0.12 bug, though.

@jnothman
Copy link
Copy Markdown
Contributor Author

A __reduce__ like:

        return (self.__class__, (), {k: v for k, v in vars(self).items()
                                     if k != 'timestamp'})

passes the test here, but the cross-version compatibility of pickles is something that would need to be dealt with in __init__. I would need to move backend to after bytes_limit if that was a goal.

This dict-based approach has the limitation that if there were to be another change of parameter name, it would be hard to handle cross-version compatibility, because restoring the object involves running __init__ then setting the __dict__ directly.

Should we be excluding joblib 0.12 from scikit-learn 0.20RC?

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Aug 14, 2018

Should we be excluding joblib 0.12 from scikit-learn 0.20RC?

Not sure what the timeline for the RC is but these problems seem fixable. I'll try to take a look shortly.

There were a lot of improvements in joblib that went in between 0.11 and 0.12 (Loky backend that will hopefully get rid of of multiprocessing freezes, work on dask distributed backend, store backend for Memory, etc ...) and we probably want it for scikit-learn 0.20.

@jnothman
Copy link
Copy Markdown
Contributor Author

jnothman commented Aug 14, 2018 via email

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Aug 14, 2018

Is the Loky backend now the default?

Yes

Is it stable?

There were some recent fixes for the stability problems that were found so far. I am definitely not the expert on this though.

Should it be mentioned in the Scikit-learn what's new?

Probably if this is not mentioned already.

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Aug 14, 2018

Sorry @jnothman I didn't see you pushed some commits so I worked on a different PR #746. Feedback more than welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants