Skip to content

FIX feature importances in random forests should sum up to 1#13636

Merged
NicolasHug merged 7 commits intoscikit-learn:masterfrom
adrinjalali:forest/feature_importances
Apr 15, 2019
Merged

FIX feature importances in random forests should sum up to 1#13636
NicolasHug merged 7 commits intoscikit-learn:masterfrom
adrinjalali:forest/feature_importances

Conversation

@adrinjalali
Copy link
Copy Markdown
Member

Same as the discussion in #7406 and #13620, the feature importances of random forest based mdoels should sum up to 1, and root only trees should be ignored.

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we want to have a single entry in the what's new for all the ensemble models, so I would suggest to update this PR or #13620, whichever gets merged last

n_classes=3)
clf = RandomForestClassifier(min_samples_leaf=5, random_state=42,
n_estimators=200).fit(X, y)
assert math.isclose(1, clf.feature_importances_.sum(), abs_tol=1e-7)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why not np.is_close()?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy's isclose is older than python's math.isclose and has some issues compared to the python's. Here's a nice conversation on the topic: numpy/numpy#10161. Since now we support python>=3.5, seems like a good choice to me to use math.isclose if comparing two floats is what we want to do.

@adrinjalali
Copy link
Copy Markdown
Member Author

LGTM. I think we want to have a single entry in the what's new for all the ensemble models, so I would suggest to update this PR or #13620, whichever gets merged last

sure, could do. But there's a slight difference that on GB{C/R} models the importances already do sum up to 1, which is not the case for forests.


.. _Hanmin Qin: https://github.com/qinhanmin2014

.. _Adrin Jalali: https://github.com/adrinjalali
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a separate PR adding our names to this list. 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, yeah, I've added mine in a few PRs, it'll be there once one of them gets merged lol

@NicolasHug NicolasHug merged commit f9af18b into scikit-learn:master Apr 15, 2019
@NicolasHug
Copy link
Copy Markdown
Member

Thanks Adrin

@adrinjalali adrinjalali deleted the forest/feature_importances branch April 15, 2019 12:32
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants