Fix #741: Memory pickling was broken with mmap_mode != None by jnothman · Pull Request #743 · joblib/joblib

jnothman · 2018-08-09T02:51:30Z

I have confirmed that this test fails at master.

codecov · 2018-08-09T03:02:49Z

Codecov Report

Merging #743 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #743      +/-   ##
==========================================
+ Coverage   95.23%   95.26%   +0.02%     
==========================================
  Files          42       42              
  Lines        6128     6144      +16     
==========================================
+ Hits         5836     5853      +17     
+ Misses        292      291       -1

Impacted Files	Coverage Δ
joblib/_store_backends.py	`90.57% <100%> (+0.09%)`	⬆️
joblib/memory.py	`95.34% <100%> (-0.03%)`	⬇️
joblib/test/test_memory.py	`97.56% <100%> (+0.06%)`	⬆️
joblib/_parallel_backends.py	`97.2% <0%> (+0.4%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec4f1ca...ca6ddc3. Read the comment docs.

aabadie · 2018-08-09T09:20:09Z

joblib/test/test_memory.py

    memory2 = pickle.loads(pickle.dumps(memory))
    assert memory.store_backend.location == memory2.store_backend.location

+    # Fix regression in #741


Maybe use the issue full URL (https://github.com/joblib/joblib/issues/741) instead of just the id ?

aabadie · 2018-08-09T09:21:15Z

Thanks @jnothman ! Apart the minor comment regarding issue URL in the test, LGTM.

jnothman · 2018-08-09T09:39:26Z

I actually think the test should be stronger and check attribute equivalence.

jnothman · 2018-08-12T03:06:50Z

It'd be great to have this merged and released

rth · 2018-08-13T15:25:02Z

It'd be great to have this merged and released

+1 As far as I understood this issue makes CI fails on scikit-learn since joblib 0.12.1 was bundled (cf scikit-learn/scikit-learn#11771)

lesteve · 2018-08-13T19:17:53Z

joblib/memory.py

-    def __init__(self, location=None, backend='local', cachedir=None,
-                 mmap_mode=None, compress=False, verbose=1, bytes_limit=None,
-                 backend_options={}):
+    def __init__(self, location=None, backend='local', mmap_mode=None,


I think the problem is more general than this. I reckon the fix is to keep the arguments order as they were in 0.11 otherwise we will break Memory pickles between 0.11 and 0.12 with no particularly good reason.

scikit-learn 0.19 has a memory argument in Pipeline so this could affect pickled Pipelines between scikit-learn 0.19 and 0.20.

With joblib 0.11:

import pickle from joblib import __version__ from joblib import Memory mem = Memory(cachedir='/tmp/test') print(pickle.dumps(mem)) # b'\x80\x03cjoblib.memory\nMemory\nq\x00(X\t\x00\x00\x00/tmp/testq\x01N\x89K\x01tq\x02Rq\x03.'

Reloading in 0.12+ (even with the current fix from this PR):

import pickle pickle.loads(b'\x80\x03cjoblib.memory\nMemory\nq\x00(X\t\x00\x00\x00/tmp/testq\x01N\x89K\x01tq\x02Rq\x03.')

/home/lesteve/miniconda3/bin/ipython:2: UserWarning: Compressed results cannot be memmapped --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-1-926293d23d86> in <module>() 1 import pickle ----> 2 pickle.loads(b'\x80\x03cjoblib.memory\nMemory\nq\x00(X\t\x00\x00\x00/tmp/testq\x01N\x89K\x01tq\x02Rq\x03.') ~/dev/joblib/joblib/memory.py in __init__(self, location, backend, mmap_mode, compress, verbose, bytes_limit, backend_options, cachedir) 845 backend, location, verbose=self._verbose, 846 backend_options=dict(compress=compress, mmap_mode=mmap_mode, --> 847 **backend_options)) 848 849 @property ~/dev/joblib/joblib/memory.py in _store_backend_factory(backend, location, verbose, backend_options) 115 if obj is None: 116 raise TypeError('Unknown location {0} or backend {1}'.format( --> 117 location, backend)) 118 119 # The store backend is configured with the extra named parameters, TypeError: Unknown location /tmp/test/joblib or backend None

I reckon the fix is to keep the arguments order as they were in 0.11 otherwise we will break Memory pickles between 0.11 and 0.12 with no particularly good reason.

I guess actually you should keep location (rather than cachedir) as the first argument.

We should probably change __reduce__ to use arguments via a dictionary rather than positionally to avoid this kind of hassles in the future ...

lesteve · 2018-08-13T20:52:32Z

Note: this __reduce__ problem (inconsistency between __init__ and __reduce__) seems to be in other places, e.g. in MemorizedFunc. I think we need to look at other similar changes following the store backend addition.

This definitely needs to be looked at in more details ...

Before thinking at doing a joblib release we should double-check that the problematic scikit-learn examples pass on joblib master. For example even with the current fix, this fails (this may or may not be related to __reduce__/__init__ inconsistency in MemorizedFunc I haven't tracked this one fully yet):

export SKLEARN_SITE_JOBLIB=1
export PYTHONPATH=~/dev/joblib # adapt this or install joblib master in your environment
ipython examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py

Full stack-trace

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/dev/scikit-learn/examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py in <module>()
     84 # Select the optimal percentage of features with grid search
     85 clf = GridSearchCV(clf, {'anova__percentile': [5, 10, 20]}, cv=cv)
---> 86 clf.fit(X, y)  # set the best parameters
     87 coef_ = clf.best_estimator_.steps[-1][1].coef_
     88 coef_ = clf.best_estimator_.steps[0][1].inverse_transform(coef_.reshape(1, -1))

~/dev/scikit-learn/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    720                 return results_container[0]
    721 
--> 722             self._run_search(evaluate_candidates)
    723 
    724         results = results_container[0]

~/dev/scikit-learn/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
   1174     def _run_search(self, evaluate_candidates):
   1175         """Search all candidates in param_grid"""
-> 1176         evaluate_candidates(ParameterGrid(self.param_grid))
   1177 
   1178 

~/dev/scikit-learn/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
    709                                for parameters, (train, test)
    710                                in product(candidate_params,
--> 711                                           cv.split(X, y, groups)))
    712 
    713                 all_candidate_params.extend(candidate_params)

~/dev/joblib/joblib/parallel.py in __call__(self, iterable)
    979             # remaining jobs.
    980             self._iterating = False
--> 981             if self.dispatch_one_batch(iterator):
    982                 self._iterating = self._original_iterator is not None
    983 

~/dev/joblib/joblib/parallel.py in dispatch_one_batch(self, iterator)
    821                 return False
    822             else:
--> 823                 self._dispatch(tasks)
    824                 return True
    825 

~/dev/joblib/joblib/parallel.py in _dispatch(self, batch)
    778         with self._lock:
    779             job_idx = len(self._jobs)
--> 780             job = self._backend.apply_async(batch, callback=cb)
    781             # A job can complete so quickly than its callback is
    782             # called before we get here, causing self._jobs to

~/dev/joblib/joblib/_parallel_backends.py in apply_async(self, func, callback)
    181     def apply_async(self, func, callback=None):
    182         """Schedule a func to be run"""
--> 183         result = ImmediateResult(func)
    184         if callback:
    185             callback(result)

~/dev/joblib/joblib/_parallel_backends.py in __init__(self, batch)
    541         # Don't delay the application, to avoid keeping the input
    542         # arguments in memory
--> 543         self.results = batch()
    544 
    545     def get(self):

~/dev/joblib/joblib/parallel.py in __call__(self)
    259         with parallel_backend(self._backend):
    260             return [func(*args, **kwargs)
--> 261                     for func, args, kwargs in self.items]
    262 
    263     def __len__(self):

~/dev/joblib/joblib/parallel.py in <listcomp>(.0)
    259         with parallel_backend(self._backend):
    260             return [func(*args, **kwargs)
--> 261                     for func, args, kwargs in self.items]
    262 
    263     def __len__(self):

~/dev/scikit-learn/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, error_score)             
    522             estimator.fit(X_train, **fit_params)
    523         else:
--> 524             estimator.fit(X_train, y_train, **fit_params)
    525 
    526     except Exception as e:

~/dev/scikit-learn/sklearn/pipeline.py in fit(self, X, y, **fit_params)
    263             This estimator
    264         """
--> 265         Xt, fit_params = self._fit(X, y, **fit_params)
    266         if self._final_estimator is not None:
    267             self._final_estimator.fit(Xt, y, **fit_params)

~/dev/scikit-learn/sklearn/pipeline.py in _fit(self, X, y, **fit_params)
    228                 Xt, fitted_transformer = fit_transform_one_cached(
    229                     cloned_transformer, Xt, y, None,
--> 230                     **fit_params_steps[name])
    231                 # Replace the transformer of the step with the fitted                                                        
    232                 # transformer. This is necessary when loading the transformer

~/dev/joblib/joblib/memory.py in __call__(self, *args, **kwargs)
    320 
    321     def __call__(self, *args, **kwargs):
--> 322         return self.func(*args, **kwargs)
    323 
    324     def call_and_shelve(self, *args, **kwargs):

~/dev/scikit-learn/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, **fit_params)
    612 def _fit_transform_one(transformer, X, y, weight, **fit_params):
    613     if hasattr(transformer, 'fit_transform'):
--> 614         res = transformer.fit_transform(X, y, **fit_params)
    615     else:
    616         res = transformer.fit(X, y, **fit_params).transform(X)

~/dev/scikit-learn/sklearn/base.py in fit_transform(self, X, y, **fit_params)
    460         else:
    461             # fit method of arity 2 (supervised transformation)
--> 462             return self.fit(X, y, **fit_params).transform(X)
    463 
    464 

~/dev/scikit-learn/sklearn/feature_selection/univariate_selection.py in fit(self, X, y)
    347 
    348         self._check_params(X, y)
--> 349         score_func_ret = self.score_func(X, y)
    350         if isinstance(score_func_ret, (list, tuple)):
    351             self.scores_, self.pvalues_ = score_func_ret

~/dev/joblib/joblib/memory.py in __call__(self, *args, **kwargs)
    509 
    510     def __call__(self, *args, **kwargs):
--> 511         return self._cached_call(args, kwargs)[0]
    512 
    513     def __reduce__(self):

~/dev/joblib/joblib/memory.py in _cached_call(self, args, kwargs)
    440         # Compare the function code with the previous to see if the
    441         # function code has changed
--> 442         func_id, args_id = self._get_output_identifiers(*args, **kwargs)
    443         metadata = None
    444         msg = None

~/dev/joblib/joblib/memory.py in _get_output_identifiers(self, *args, **kwargs)
    530         """Return the func identifier and input parameter hash of a result."""
    531         func_id = _build_func_identifier(self.func)
--> 532         argument_hash = self._get_argument_hash(*args, **kwargs)
    533         return func_id, argument_hash
    534 

~/dev/joblib/joblib/memory.py in _get_argument_hash(self, *args, **kwargs)
    524 
    525     def _get_argument_hash(self, *args, **kwargs):
--> 526         return hashing.hash(filter_args(self.func, self.ignore, args, kwargs),
    527                             coerce_mmap=(self.mmap_mode is not None))
    528 

~/dev/joblib/joblib/func_inspect.py in filter_args(func, ignore_lst, args, kwargs)
    306 
    307     # Now remove the arguments to be ignored
--> 308     for item in ignore_lst:
    309         if item in arg_dict:
    310             arg_dict.pop(item)

TypeError: 'bool' object is not iterable

jnothman · 2018-08-13T20:54:16Z

You're right that __reduce__ using kwargs is probably more robust. I'm not sure yet what's happening in the pickle-in-0.11-restore-in-0.12 bug, though.

jnothman · 2018-08-13T23:42:55Z

A __reduce__ like:

        return (self.__class__, (), {k: v for k, v in vars(self).items()
                                     if k != 'timestamp'})

passes the test here, but the cross-version compatibility of pickles is something that would need to be dealt with in __init__. I would need to move backend to after bytes_limit if that was a goal.

This dict-based approach has the limitation that if there were to be another change of parameter name, it would be hard to handle cross-version compatibility, because restoring the object involves running __init__ then setting the __dict__ directly.

Should we be excluding joblib 0.12 from scikit-learn 0.20RC?

lesteve · 2018-08-14T04:44:03Z

Should we be excluding joblib 0.12 from scikit-learn 0.20RC?

Not sure what the timeline for the RC is but these problems seem fixable. I'll try to take a look shortly.

There were a lot of improvements in joblib that went in between 0.11 and 0.12 (Loky backend that will hopefully get rid of of multiprocessing freezes, work on dask distributed backend, store backend for Memory, etc ...) and we probably want it for scikit-learn 0.20.

jnothman · 2018-08-14T05:13:22Z

(Is the Loky backend now the default? Is it stable? Should it be mentioned in the Scikit-learn what's new?)

lesteve · 2018-08-14T07:21:18Z

Is the Loky backend now the default?

Yes

Is it stable?

There were some recent fixes for the stability problems that were found so far. I am definitely not the expert on this though.

Should it be mentioned in the Scikit-learn what's new?

Probably if this is not mentioned already.

lesteve · 2018-08-14T15:42:44Z

Sorry @jnothman I didn't see you pushed some commits so I worked on a different PR #746. Feedback more than welcome.

Fix joblib#741: Memory pickling was broken with mmap_mode != None

4c32498

aabadie reviewed Aug 9, 2018

View reviewed changes

TST stronger test that all attributes are restored

b6b1787

lesteve reviewed Aug 13, 2018

View reviewed changes

jnothman added 3 commits August 14, 2018 19:10

More generic __reduce__; fix for MemorizedFunc too

d0164be

Strengthen test by testing store_backend equality too

2b900c7

not in -> !=

ca6ddc3

lesteve mentioned this pull request Aug 14, 2018

FIX pickle roundtrip for Memory and related classes. #746

Merged

rth mentioned this pull request Aug 18, 2018

[MRG+1] FIX hotfix for joblib#741 scikit-learn/scikit-learn#11837

Merged

ogrisel closed this in #746 Aug 23, 2018

Conversation

jnothman commented Aug 9, 2018 • edited by lesteve Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aabadie Aug 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aabadie commented Aug 9, 2018

Uh oh!

jnothman commented Aug 9, 2018 via email

Uh oh!

jnothman commented Aug 12, 2018

Uh oh!

rth commented Aug 13, 2018

Uh oh!

lesteve Aug 13, 2018

Choose a reason for hiding this comment

Uh oh!

lesteve Aug 13, 2018

Choose a reason for hiding this comment

Uh oh!

lesteve Aug 13, 2018

Choose a reason for hiding this comment

Uh oh!

lesteve commented Aug 13, 2018

Uh oh!

jnothman commented Aug 13, 2018

Uh oh!

jnothman commented Aug 13, 2018

Uh oh!

lesteve commented Aug 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Aug 14, 2018 via email

Uh oh!

lesteve commented Aug 14, 2018

Uh oh!

lesteve commented Aug 14, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jnothman commented Aug 9, 2018 •

edited by lesteve

Loading

codecov bot commented Aug 9, 2018 •

edited

Loading

aabadie Aug 9, 2018 •

edited

Loading

lesteve commented Aug 14, 2018 •

edited

Loading