[MRG] EHN: Change default n_estimators to 100 for random forest by annaayzenshtat · Pull Request #11542 · scikit-learn/scikit-learn

annaayzenshtat · 2018-07-15T22:29:04Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Issues deprecation warning message for the default n_estimators parameter for the forest classifiers. Test added for the warning message when the default parameter is used.

Any other comments?

…test

amueller · 2018-07-15T22:34:28Z

Is this based on #11172? The contributor there seems to have addressed the comments there yesterday...

amueller · 2018-07-15T22:35:01Z

though it looks like #11172 is still not right...

amueller · 2018-07-15T22:35:30Z

sklearn/ensemble/forest.py

    n_estimators : integer, optional (default=10)
        The number of trees in the forest.

+        .. deprecated:: 0.20


should be "versionchanged" not "deprecated"

versionchanged as one long word, no spaces?

yes. git grep versionchanged?

working on that.

amueller · 2018-07-15T22:36:16Z

sklearn/ensemble/tests/test_forest.py

            assert_equal(tree.min_impurity_decrease, 0.1)
+
+
+def test_nestimators_future_warning():


It might be better to you pytest.parametrize as above instead of the loop, which will run each estimator as a separate test.

FYI:

@pytest.mark.parametrize('forest', [RandomForestClassifier(), RandomForestRegressor(), ExtraTreesClassifier(), ExtraTreesRegressor(), RandomTreesEmbedding()]) def test_n_estimators_future_warning(estimator): .... estimator.fit(X, y) ....

Might be better to parametrize with classes,

@pytest.mark.parametrize('forest', [RandomForestClassifier, RandomForestRegressor, [...]

then create the corresponding instances inside the test -- this works better for getting a human readable test name with pytest..

Fair enough

amueller · 2018-07-15T22:36:48Z

though it looks like #11172 is still not right...

This looks pretty good. Ideally you'd catch also deprecation warnings if they are raised in the tests now.

glemaitre

You will also need to add an entry in the what's new file for v0.20 stating the change of behavior in the future.

glemaitre · 2018-07-15T22:44:56Z

sklearn/ensemble/forest.py

        self : object
        """
+
+        if self.n_estimators == 'warn':


The check and validation should be done in fit instead of __init__

So should I change back to n_estimators=10 instead of n_estimators='warn', and then change my if conditional check in the fit() method?

no the warn is good, just the test should be in the other place.

You can refer to: https://github.com/scikit-learn/scikit-learn/pull/11469/files#diff-e6faf37b13574bc591afbf0536128735R864

This is still not merged but we follow this convention: __init__ just assign the parameters to the class attributes and we do checking and validation in the fit method.

Aren't lines 245 and 246 above inside the fit() method?

Ups sorry it is good there. I good confused with another PR :)

glemaitre · 2018-07-15T22:45:49Z

sklearn/ensemble/tests/test_forest.py

+
+
+def test_nestimators_future_warning():
+    # Test that n_estimators future warning is raised. Will be removed in 0.22


You can use FIXME: to be removed 0.22

glemaitre · 2018-07-15T22:47:40Z

sklearn/ensemble/tests/test_forest.py

            assert_equal(tree.min_impurity_decrease, 0.1)
+
+
+def test_nestimators_future_warning():


FYI:

@pytest.mark.parametrize('forest', [RandomForestClassifier(), RandomForestRegressor(), ExtraTreesClassifier(), ExtraTreesRegressor(), RandomTreesEmbedding()]) def test_n_estimators_future_warning(estimator): .... estimator.fit(X, y) ....

glemaitre · 2018-07-16T04:02:35Z

FYI: I updated the title of this PR.

massich · 2018-07-16T16:23:04Z

@annaayzenshtat this is a blocker for 0.20 (which we are actively working on right now). If you don't have time to address the comments at this moment that's completely fine. Ping me and I'll take over the PR.

annaayzenshtat · 2018-07-16T18:19:51Z

I'm still working on this issue

annaayzenshtat · 2018-07-17T05:50:17Z

I committed the requested changes. Please take a look at these code changes.

glemaitre · 2018-07-17T05:53:35Z

Actually you need to flag the tests with pytest.mark.filterwarnings to avoid raising the future warning in the tests (typically the one that does not set n_estimators)

annaayzenshtat · 2018-07-17T05:54:47Z

Ok, I'll change it.

glemaitre

You can check this PR as an example how to use pytest

https://github.com/scikit-learn/scikit-learn/pull/11574/files

annaayzenshtat · 2018-07-17T06:23:00Z

I flagged the test with pytest.mark.filterwarnings.

glemaitre · 2018-07-17T15:48:04Z

@annaayzenshtat I am helping a bit with the failure that you got and I am filtering the warning because it seems that they are in a lot of places.

amueller

lgtm if tests pass

annaayzenshtat · 2018-07-17T18:29:50Z

Ok, thank you!

amueller · 2018-07-17T19:04:49Z

python2.7 test failure :-/

amueller · 2018-07-17T19:05:39Z

In SAG?!

annaayzenshtat · 2018-07-17T19:15:43Z

Is there something I'm supposed to do to fix the Python 2.7 failure?

glemaitre · 2018-07-17T19:22:47Z

Nop this is some side effect already shown and solve in #11574

annaayzenshtat · 2018-07-17T19:27:24Z

Ok.

glemaitre · 2018-07-17T19:48:53Z

@annaayzenshtat Thanks a lot for the contribution.
Feel free to take any other issue ;)

annaayzenshtat · 2018-07-17T20:00:37Z

Thank you!

annaayzenshtat added 5 commits July 15, 2018 16:50

Added deprecation warning for n_estimators default value and created …

cd948b6

…test

Changed msg_future string

a3738cb

Added period to n_estimators warning message

e991bc1

Fixed linting issues pertaining to code I added

ef33157

Removed blank line

76c8092

amueller reviewed Jul 15, 2018

View reviewed changes

glemaitre requested changes Jul 15, 2018

View reviewed changes

glemaitre changed the title ~~Fix to Issue #11128: Create deprecation warning for default n_estimators in RandomForest~~ EHN: Change default n_estimators to 100 for random forest Jul 16, 2018

glemaitre changed the title ~~EHN: Change default n_estimators to 100 for random forest~~ [MRG] EHN: Change default n_estimators to 100 for random forest Jul 16, 2018

annaayzenshtat added 3 commits July 17, 2018 00:44

Added entry for change in default of n_estimators parameter

be10d8d

Changed deprecated to versionchanged

fbdf66c

Changed loop to pytest.mark.parametrize

2aa3b7f

glemaitre approved these changes Jul 17, 2018

View reviewed changes

glemaitre requested changes Jul 17, 2018

View reviewed changes

Added pytest.mark.filterwarnings to filter n_estimators warning

cec4dd7

TST add filter warnings in the ensemble module

2cf678e

glemaitre added 2 commits July 17, 2018 18:27

TST avoid further future warning in tests

0305354

MAINT do not show the warning

a7e7a93

glemaitre added 2 commits July 17, 2018 18:49

DOC set the number of estimators in examples

68d9168

cleaning

bb1d786

glemaitre approved these changes Jul 17, 2018

View reviewed changes

amueller approved these changes Jul 17, 2018

View reviewed changes

amueller merged commit 2242c59 into scikit-learn:master Jul 17, 2018

annaayzenshtat deleted the fix/n_estimators_100 branch July 17, 2018 19:49

qinhanmin2014 mentioned this pull request Jul 20, 2018

Fixes #11128 : Default n_estimator value should be 100 #11172

Closed

		assert_equal(tree.min_impurity_decrease, 0.1)


		def test_nestimators_future_warning():



		def test_nestimators_future_warning():
		# Test that n_estimators future warning is raised. Will be removed in 0.22

Uh oh!

Conversation

annaayzenshtat commented Jul 15, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

amueller commented Jul 15, 2018

Uh oh!

amueller commented Jul 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller Jul 15, 2018 • edited by glemaitre Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Jul 15, 2018

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Jul 16, 2018

Uh oh!

massich commented Jul 16, 2018

Uh oh!

annaayzenshtat commented Jul 16, 2018

Uh oh!

annaayzenshtat commented Jul 17, 2018

Uh oh!

glemaitre commented Jul 17, 2018

Uh oh!

annaayzenshtat commented Jul 17, 2018

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

annaayzenshtat commented Jul 17, 2018

Uh oh!

glemaitre commented Jul 17, 2018

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

annaayzenshtat commented Jul 17, 2018

Uh oh!

amueller commented Jul 17, 2018

Uh oh!

amueller commented Jul 17, 2018

Uh oh!

annaayzenshtat commented Jul 17, 2018

Uh oh!

amueller Jul 15, 2018 •

edited by glemaitre

Loading