[MRG] EHN: Change default n_estimators to 100 for random forest#11542
[MRG] EHN: Change default n_estimators to 100 for random forest#11542amueller merged 14 commits intoscikit-learn:masterfrom
Conversation
|
Is this based on #11172? The contributor there seems to have addressed the comments there yesterday... |
|
though it looks like #11172 is still not right... |
sklearn/ensemble/forest.py
Outdated
| n_estimators : integer, optional (default=10) | ||
| The number of trees in the forest. | ||
|
|
||
| .. deprecated:: 0.20 |
There was a problem hiding this comment.
should be "versionchanged" not "deprecated"
There was a problem hiding this comment.
versionchanged as one long word, no spaces?
There was a problem hiding this comment.
working on that.
| assert_equal(tree.min_impurity_decrease, 0.1) | ||
|
|
||
|
|
||
| def test_nestimators_future_warning(): |
There was a problem hiding this comment.
It might be better to you pytest.parametrize as above instead of the loop, which will run each estimator as a separate test.
There was a problem hiding this comment.
FYI:
@pytest.mark.parametrize('forest', [RandomForestClassifier(), RandomForestRegressor(),
ExtraTreesClassifier(), ExtraTreesRegressor(),
RandomTreesEmbedding()])
def test_n_estimators_future_warning(estimator):
....
estimator.fit(X, y)
....There was a problem hiding this comment.
Might be better to parametrize with classes,
@pytest.mark.parametrize('forest', [RandomForestClassifier, RandomForestRegressor,
[...]then create the corresponding instances inside the test -- this works better for getting a human readable test name with pytest..
|
though it looks like #11172 is still not right... This looks pretty good. Ideally you'd catch also deprecation warnings if they are raised in the tests now. |
glemaitre
left a comment
There was a problem hiding this comment.
You will also need to add an entry in the what's new file for v0.20 stating the change of behavior in the future.
| self : object | ||
| """ | ||
|
|
||
| if self.n_estimators == 'warn': |
There was a problem hiding this comment.
The check and validation should be done in fit instead of __init__
There was a problem hiding this comment.
So should I change back to n_estimators=10 instead of n_estimators='warn', and then change my if conditional check in the fit() method?
There was a problem hiding this comment.
no the warn is good, just the test should be in the other place.
There was a problem hiding this comment.
You can refer to: https://github.com/scikit-learn/scikit-learn/pull/11469/files#diff-e6faf37b13574bc591afbf0536128735R864
This is still not merged but we follow this convention: __init__ just assign the parameters to the class attributes and we do checking and validation in the fit method.
There was a problem hiding this comment.
Aren't lines 245 and 246 above inside the fit() method?
There was a problem hiding this comment.
Ups sorry it is good there. I good confused with another PR :)
|
|
||
|
|
||
| def test_nestimators_future_warning(): | ||
| # Test that n_estimators future warning is raised. Will be removed in 0.22 |
There was a problem hiding this comment.
You can use FIXME: to be removed 0.22
| assert_equal(tree.min_impurity_decrease, 0.1) | ||
|
|
||
|
|
||
| def test_nestimators_future_warning(): |
There was a problem hiding this comment.
FYI:
@pytest.mark.parametrize('forest', [RandomForestClassifier(), RandomForestRegressor(),
ExtraTreesClassifier(), ExtraTreesRegressor(),
RandomTreesEmbedding()])
def test_n_estimators_future_warning(estimator):
....
estimator.fit(X, y)
....|
FYI: I updated the title of this PR. |
|
@annaayzenshtat this is a blocker for 0.20 (which we are actively working on right now). If you don't have time to address the comments at this moment that's completely fine. Ping me and I'll take over the PR. |
|
I'm still working on this issue |
|
I committed the requested changes. Please take a look at these code changes. |
|
Actually you need to flag the tests with pytest.mark.filterwarnings to avoid raising the future warning in the tests (typically the one that does not set n_estimators) |
|
Ok, I'll change it. |
glemaitre
left a comment
There was a problem hiding this comment.
You can check this PR as an example how to use pytest
https://github.com/scikit-learn/scikit-learn/pull/11574/files
|
I flagged the test with pytest.mark.filterwarnings. |
|
@annaayzenshtat I am helping a bit with the failure that you got and I am filtering the warning because it seems that they are in a lot of places. |
|
Ok, thank you! |
|
python2.7 test failure :-/ |
|
In SAG?! |
|
Is there something I'm supposed to do to fix the Python 2.7 failure? |
|
Nop this is some side effect already shown and solve in #11574 |
|
Ok. |
|
@annaayzenshtat Thanks a lot for the contribution. |
|
Thank you! |
Reference Issues/PRs
Fixes #11128.
What does this implement/fix? Explain your changes.
Issues deprecation warning message for the default n_estimators parameter for the forest classifiers. Test added for the warning message when the default parameter is used.
Any other comments?