MNT KBinsDiscretizer.transform should not mutate _encoder#12514
MNT KBinsDiscretizer.transform should not mutate _encoder#12514jnothman merged 6 commits intoscikit-learn:masterfrom qinhanmin2014:KBinsDiscretizer
Conversation
|
Hmm, I'm unable to figure out a way to correct the common test. I don't think it's good to use |
|
I'll take it after we figure out the solution. See #12490 (comment) |
ogrisel
left a comment
There was a problem hiding this comment.
This solution looks good to me.
I don't see how we could improve the common test (check_estimators_overwrite_params). I think it's fine as it is, even though it could not detect this specific issue.
If I recall correctly, this problem (with the _encoder attribute being mutated by KBinsDiscretizer.transform) was originally found by @pierreglaser doing concurrent calls to transform with the threading backend of joblib and the calls where not thread-safe as expected for transform.
I am not sure we want to add thread-safety checks for the transform method of transformers. This is not officially part of our public "API" as far as I know.
jnothman
left a comment
There was a problem hiding this comment.
Yes I think we know that that common test is a weak one. It does not check that the objects do not change. We could potentially make it tighter by using some kind of hash, but it could make it a rather heavy test.
Yes, this is the right fix IMO. I'm not sure that we need a non-regression test.
@ogrisel regarding requiring prediction methods to be thread-safe, I like the idea in general. I think we may elsewhere have prediction methods that update a random state. Maybe they don't... and maybe they shouldn't!
|
You can add a changelog entry mentioning that transform is now thread safe
|
|
Thanks @qinhanmin2014 |
* upstream/master: joblib 0.13.0 (scikit-learn#12531) DOC tweak KMeans regarding cluster_centers_ convergence (scikit-learn#12537) DOC (0.21) Make sure plot_tree docs are generated and fix link in whatsnew (scikit-learn#12533) ALL Add HashingVectorizer to __all__ (scikit-learn#12534) BLD we should ensure continued support for joblib 0.11 (scikit-learn#12350) fix typo in whatsnew Fix dead link to numpydoc (scikit-learn#12532) [MRG] Fix segfault in AgglomerativeClustering with read-only mmaps (scikit-learn#12485) MNT (0.21) OPTiCS change the default `algorithm` to `auto` (scikit-learn#12529) FIX SkLearn `.score()` method generating error with Dask DataFrames (scikit-learn#12462) MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-learn#12514)
…ybutton * upstream/master: FIX YeoJohnson transform lambda bounds (scikit-learn#12522) [MRG] Additional Warnings in case OpenML auto-detected a problem with dataset (scikit-learn#12541) ENH Prefer threads for IsolationForest (scikit-learn#12543) joblib 0.13.0 (scikit-learn#12531) DOC tweak KMeans regarding cluster_centers_ convergence (scikit-learn#12537) DOC (0.21) Make sure plot_tree docs are generated and fix link in whatsnew (scikit-learn#12533) ALL Add HashingVectorizer to __all__ (scikit-learn#12534) BLD we should ensure continued support for joblib 0.11 (scikit-learn#12350) fix typo in whatsnew Fix dead link to numpydoc (scikit-learn#12532) [MRG] Fix segfault in AgglomerativeClustering with read-only mmaps (scikit-learn#12485) MNT (0.21) OPTiCS change the default `algorithm` to `auto` (scikit-learn#12529) FIX SkLearn `.score()` method generating error with Dask DataFrames (scikit-learn#12462) MNT KBinsDiscretizer.transform should not mutate _encoder (scikit-learn#12514)
…ikit-learn#12514)" This reverts commit 6ae1628.
…ikit-learn#12514)" This reverts commit 6ae1628.
Fixes #12490
Also fix a bug in the common test, which can serve as the regression test of the PR.I'm unable to figure out a way to correct the common test, see the comment below.