Fixes #11128 : Default n_estimator value should be 100#11172
Fixes #11128 : Default n_estimator value should be 100#11172Olamyy wants to merge 25 commits intoscikit-learn:masterfrom Olamyy:n_estimator_should_be_100
Conversation
|
Upcoming commits with deprecation warning in progress. |
jnothman
left a comment
There was a problem hiding this comment.
Please see our documentation on deprecating default values. Changing it like this would break backwards compatibility
jnothman
left a comment
There was a problem hiding this comment.
Please addd a test that the warning works correctly. We have helpers like assert_warns
sklearn/ensemble/forest.py
Outdated
| self.class_weight = class_weight | ||
|
|
||
| if self.n_estimators == 10: | ||
| warnings.warn("'n_estimators' default value was changed to 100 in version 0.20.", DeprecationWarning) |
There was a problem hiding this comment.
What we want is "n_estimators default value will change to 100 in version 0.20".
By convention, this warning should only be issued in fit, not __init__.
doc/whats_new/v0.20.rst
Outdated
| several times in a row. :issue:`9234` by :user:`andrewww <andrewww>` | ||
| and :user:`Minghui Liu <minghui-liu>`. | ||
|
|
||
| - In both :class:`ensemble.RandomForestClassifier` and `ensemble.ExtraTreesClassifier`, |
There was a problem hiding this comment.
This should be under API changes, rather than enhancements.
There was a problem hiding this comment.
Fixed this too in my latest commit.
|
@jnothman For the test, do I create a new, say, |
|
add a function in |
|
|
||
| @pytest.mark.parametrize('name', FOREST_ESTIMATORS) | ||
| def test_n_estimator_raises_warning(name): | ||
| check_raises_n_estimator_warning(name) |
There was a problem hiding this comment.
There is no need to have a separate function.
Please also test that no warning is raised if the user explicitly passed a value
sklearn/ensemble/forest.py
Outdated
| """ | ||
| # Validate hyperparameters | ||
| if self.n_estimators == 10: | ||
| warnings.warn("'n_estimators' default value will change to 100 in version 0.20", DeprecationWarning) |
There was a problem hiding this comment.
We don't want to break the user code.
Currently, with your PR, a user using the default value will witness a change in its code, in particular increasing its fit time.
We probably want to just warn about future change:
- we should not change the value to 100
- the DeprecationWarning should be a FutureWarning
- the warning needs to say "will be changed in 0.22", instead of "has changed in 0.20"
- the warning could also be more pedagogical, explaining this value needs to be changed manually, as suggested in the issue Change default n_estimators in RandomForest (to 100?) #11128 (comment)
…20 to 0.22. Provided a simple helper on changing the default value to 100
…20 to 0.22. Provided a simple helper on changing the default value to 100
| @@ -0,0 +1,118 @@ | |||
|
|
|||
| ======= | |||
sklearn/ensemble/forest.py
Outdated
| """ | ||
| # Validate hyperparameters | ||
| if self.n_estimators == 10: | ||
| warnings.warn("'n_estimators' default value will be changed to 100 in version 0.22. For optimal performance, you should make change this value to 100", FutureWarning) |
There was a problem hiding this comment.
I don't think that you should have the statement about optimal performance.
Also line length must be <80
| ForestClassifier = FOREST_CLASSIFIERS[name] | ||
| clf = ForestClassifier(n_estimators=100, random_state=1, max_features=1, | ||
| max_depth=1) | ||
| assert_warns(DeprecationWarning, func=clf.fit, X=iris.data, y=iris.target) |
There was a problem hiding this comment.
shouldn't this be assert_no_warnings?
There was a problem hiding this comment.
Totally missed that.
I've done well to fix it in my latest commit.
sklearn/ensemble/forest.py
Outdated
| self : object | ||
| """ | ||
| # Validate hyperparameters | ||
| if self.n_estimators == 10: |
There was a problem hiding this comment.
Often we would do something like change the default to None, and treat it as "warn and use 10" to avoid warning when the user explicitly sets n_estimators=10
There was a problem hiding this comment.
Is this in the dev guide? I don't think so, right? Also that deprecations should be in fit is not, I think? New issue? ;)
There was a problem hiding this comment.
@jnorthman, I'm not sure I understand you.
Are you advising I change the n_estimators value to None?
There was a problem hiding this comment.
@amueller I've created two issues to take care of both of these.
There was a problem hiding this comment.
Yes, defaulting n_estimators=None would be fine. n_estimators='warn'
|
I think we already have some good PRs to serve as examples (e.g., #9397 #10331 ) ? And I think @TomDLT 's comment(#11172 (comment)) has summarized what to do here. |
| print(cm) | ||
|
|
||
| #import matplotlib.pyplot as plt | ||
| #import matlotlib.pyplot as plt |
There was a problem hiding this comment.
Please revert this unrelated change
sklearn/ensemble/forest.py
Outdated
| Parameters | ||
| ---------- | ||
| n_estimators : integer, optional (default=10) | ||
| n_estimators : integer, optional (default=100) |
There was a problem hiding this comment.
The default is still 10. There should be a note that it'll change in 0.22
|
@Olamyy |
|
@Olamyy Are you still willing to fix this issue, otherwise I would take care about it. |
|
Hi @glemaitre, yes I am. Been a little busy. I should be able to complete it tonight. |
|
#response_container_BBPPID{font-family: initial; font-size:initial; color: initial;} Great thanks ;) Sent from my phone - sorry to be brief and potential misspell.
|
…ent. Made sure all tests are passing.
| print(cm) | ||
|
|
||
| #import matplotlib.pyplot as plt | ||
| #import matlotlib.pyplot as plt |
There was a problem hiding this comment.
Please revert this unrelated change
| # more useful. | ||
| # Fit the pipeline on the training set using grid search for the parameters | ||
|
|
||
| # TASK: print the cross-validated scores for the each parameters set |
There was a problem hiding this comment.
Please revert this unrelated change
| ForestClassifier = FOREST_CLASSIFIERS[name] | ||
| clf = ForestClassifier(n_estimators=10, random_state=1, max_features=1, | ||
| max_depth=1) | ||
| assert_warns(FutureWarning, func=clf.fit, X=iris.data, y=iris.target) |
There was a problem hiding this comment.
Explicitly passing n_estimators=10 should not result in a warning.
Passing nothing for n_estimators should result in n warning
sklearn/ensemble/forest.py
Outdated
| self : object | ||
| """ | ||
| # Validate hyperparameters | ||
| if self.n_estimators == 10: |
There was a problem hiding this comment.
Yes, defaulting n_estimators=None would be fine. n_estimators='warn'
|
This pull request introduces 1 alert when merging e8147fc into 8083ea4 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
jnothman
left a comment
There was a problem hiding this comment.
The default value doesn't matter so much. The default behaviour must be to warn and use 10 estimators. And the behaviour when a number is passed must be to not warn.
|
|
||
|
|
||
| @pytest.mark.parametrize('name', FOREST_CLASSIFIERS) | ||
| def test_warning_raised_with_deprecated_n_estimator(name): |
There was a problem hiding this comment.
should be called "no_warning_raised".
sklearn/ensemble/forest.py
Outdated
| self : object | ||
| """ | ||
| # Validate hyperparameters | ||
| if not self.n_estimators: |
| :class:`linear_model.LogisticRegression` when ``verbose`` is set to 0. | ||
| :issue:`10881` by :user:`Alexandre Sevin <AlexandreSev>`. | ||
|
|
||
| - In both :class:`ensemble.RandomForestClassifier` and `ensemble.ExtraTreesClassifier`, |
There was a problem hiding this comment.
missing :class: for the extra trees. Also, the message should read something like " n_estimators will be changed to 100 in version 0.22. A FutureWarning is raised when the default value is used." or similar. n_estimators=10doesn't raise a warning (as is tested).
jnothman
left a comment
There was a problem hiding this comment.
We need to have the default n_estimators=None.
We would like to have this finished before an upcoming release, as we believe it is important that we correct old wrongs.
I hope it is okay if another contributor completes this work
Fixes #11128