[MRG+1] IsolationForest max_samples warning and calculation#5678
[MRG+1] IsolationForest max_samples warning and calculation#5678glouppe merged 8 commits intoscikit-learn:masterfrom betatim:no-warning-iforest
Conversation
max_samples='auto' replaces max_samples=256 as the default setting. Recalculate max_depth based on the actual value of max_samples after checking range and other constraints
sklearn/ensemble/iforest.py
Outdated
There was a problem hiding this comment.
Here you are duplicating code from BaseBagging._fit(). You don't need L158,159,160 and paragraph L169.
Added a test that checks if max_depth is recalculated for small samples. Readded the max_samples_ property to gain access to it in predict()
|
I re-added the Added a test to check that |
|
We introduced this attribute before adding the |
sklearn/ensemble/iforest.py
Outdated
There was a problem hiding this comment.
It must be on base_estimator_, has you cannot modify an init argument at any time.
There was a problem hiding this comment.
It must be on base_estimator_, has you cannot modify an init argument at any time.
+1
There was a problem hiding this comment.
base_estimator_ doesn't exist. There is a list of the fitted estimators (self.estimators_) but setting the property after fitting is too late. BaseEnsemble explicitly delays instantiating the estimators so that the parameters of the base_estimator can still be changed. So I think this is what you want to be doing. The only alternative I see is not instantiating the base estimator until fit() is being called. For this we'd have to change BaseEnsemble though, as it expects you to pass a value for the base_estimator argument.
There was a problem hiding this comment.
One other way would be to add an optional argument max_depth (argument to use instead of self.max_depth) to _fit() in BaseBagging, and to change self.base_estimator_.max_depth after self._validate_estimator() (L.290 in bagging.py). Indeed _validate_estimator() makes base_estimator_ exist.
There was a problem hiding this comment.
Sorry for being a bit thick, I think I get it now.
|
Added a test for Finally understood what you meant with |
|
design looks good ! you have a travis failure. |
sklearn/ensemble/bagging.py
Outdated
There was a problem hiding this comment.
you should say
Argument to use instead of the one initially passed to the base estimator.
This is supported only if the base estimator has a max_depth parameter.
|
Apart for this two minor comments, everything looks great to me. Thanks a lot @betatim :) ! |
|
Turns out in Thanks for your patience! 🍰 |
|
@ngoix ok for you? |
|
Yes everything looks fine. Any idea why appveyor is not happy? |
|
I don't why it only breaks for that particular instance (python 2.7.8 64bit), but the error is raised here: |
|
What happens if you change the else statement in |
sklearn/ensemble/bagging.py
Outdated
There was a problem hiding this comment.
This is another left-out bug that we missed. It should be max_samples = int(max_samples * X.shape[0]) (without self).
There was a problem hiding this comment.
@betatim if you make this change appveyor should be happy.
|
There is another thing that I dont currently understand in the codebase. In (Sorry @betatim for uncovering more bugs and make your PR longer :)) |
|
Yes you are right @glouppe it was wrong, but it is fixed now :) (see L229) |
|
Oh yeah right, forget what I said :) |
|
Appveyor is unhappy because L266 checks if the argument is an Not quite sure where we get the |
|
use six.integer_types to test if you have an int.
|
|
Merci. The check I used was inspired by others in |
|
All builds are green again. |
|
+1 merge @glouppe merge is you're happy too |
[MRG+1] IsolationForest max_samples warning and calculation
|
Looking good, merging. Thanks Tim! |
|
🍰 Just noticed the number of this PR! |
In response to #5672
Introduces
max_samples='auto'as the new default to make it possible to check if the user set a non default value. Only warn aboutmax_samples>n_samplesif the user set the value explicitly.Fixes a bug where
max_depthwas not recalculated whenmax_sampleshas changed.Now 256 appears as a magic value in both
fitandpredict, which isn't nice. I'll think about how to make it nicer, suggestions welcome.todo:
max_depthrecalculation works/cc @ngoix