DOC Rewrite user-guide to clarify feature_importance_ are impurity based by ysunmi0427 · Pull Request #16237 · scikit-learn/scikit-learn

ysunmi0427 · 2020-01-26T12:13:28Z

Reference Issues/PRs

Closes #14528. See also #14530

What does this implement/fix? Explain your changes.

I clarify feature importance in every tree method by mentioning that is impurity-based. In User-guide, examples, and doc-string, it is now clear that feature importance comes from impurity concept. At some point, I mention Permutation Importance vs Random Forest Feature Importance (MDI) to show impurity-based importance is not only choice but we have a good alternative.

NicolasHug

Thanks @ysunmi0427 , a few comments

doc/modules/ensemble.rst

sklearn/ensemble/_forest.py

ysunmi0427 · 2020-01-26T14:16:23Z

@NicolasHug I fixed line you mentioned. Thank you for your reviews!

NicolasHug

Nit but LGTM

sklearn/ensemble/_forest.py

rth

Thanks, a few comments otherwise LGTM.

sklearn/ensemble/_forest.py

doc/modules/ensemble.rst

ysunmi0427 · 2020-01-27T10:13:40Z

@rth I fixed line you mentioned. Thanks for your reviews!

glemaitre · 2020-01-27T21:23:32Z

doc/modules/ensemble.rst

 to the prediction function.

+The impurity-based feature importance suffers from being computed 
+on statistics derived from the training dataset. 


I think that the issue is not only about deriving statistics from the training set but the bias toward feature with the high-cardinality features.

glemaitre

LGTM otherwise

ysunmi0427 · 2020-02-01T07:51:10Z

@glemaitre I fixed line you mentioned about that impurity-based feature importance favors high cardinality features (typically numerical features).

rth

Thanks @ysunmi0427, LGTM.

impurity-based feature importance favors high cardinality features (typically numerical features).

Pushed a quick fix removing the last part , because high cardinality features are usual categorical nor numeric (at least before an encoding is applied to them).

Will merge when CI is green.

…#16237)

Rewrite user-guide to clarify feature_importance_ are impurity based

e220041

NicolasHug reviewed Jan 26, 2020

View reviewed changes

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

sklearn/ensemble/_forest.py Outdated Show resolved Hide resolved

apply comments from NicolasHug

c1b270e

NicolasHug approved these changes Jan 26, 2020

View reviewed changes

sklearn/ensemble/_forest.py Show resolved Hide resolved

add new line for function description

170ca6b

ysunmi0427 force-pushed the doc-update branch from 7445d91 to 170ca6b Compare January 26, 2020 16:47

rth reviewed Jan 27, 2020

View reviewed changes

sklearn/ensemble/_forest.py Outdated Show resolved Hide resolved

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

doc/modules/ensemble.rst Outdated Show resolved Hide resolved

rth added the Documentation label Jan 27, 2020

rth changed the title ~~Rewrite user-guide to clarify feature_importance_ are impurity based~~ DOC Rewrite user-guide to clarify feature_importance_ are impurity based Jan 27, 2020

apply comments from rth

772a536

glemaitre reviewed Jan 27, 2020

View reviewed changes

glemaitre approved these changes Jan 27, 2020

View reviewed changes

jnothman approved these changes Jan 27, 2020

View reviewed changes

rth added the Needs work label Jan 29, 2020

apply comments from glemaitre

7151fbd

ysunmi0427 and others added 2 commits February 1, 2020 16:53

fix typo

77a0e33

DOC remove mention that high cardinality features are numerical features

5298cfb

rth approved these changes Feb 1, 2020

View reviewed changes

rth merged commit 4a18796 into scikit-learn:master Feb 1, 2020

ogrisel mentioned this pull request Feb 4, 2020

DOC More explicit warnings about the misleading use of impurity-based feature_importances #16382

Merged

thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020

DOC Clarify that feature_importance_ are impurity based (scikit-learn…

8b9cc39

…#16237)

panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020

DOC Clarify that feature_importance_ are impurity based (scikit-learn…

7ff21ef

…#16237)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC Rewrite user-guide to clarify feature_importance_ are impurity based#16237

DOC Rewrite user-guide to clarify feature_importance_ are impurity based#16237
rth merged 7 commits intoscikit-learn:masterfrom
ysunmi0427:doc-update

ysunmi0427 commented Jan 26, 2020

Uh oh!

NicolasHug left a comment

Uh oh!

Uh oh!

Uh oh!

ysunmi0427 commented Jan 26, 2020

Uh oh!

NicolasHug left a comment

Uh oh!

Uh oh!

rth left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ysunmi0427 commented Jan 27, 2020

Uh oh!

glemaitre Jan 27, 2020

Uh oh!

glemaitre left a comment

Uh oh!

ysunmi0427 commented Feb 1, 2020

Uh oh!

rth left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

ysunmi0427 commented Jan 26, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ysunmi0427 commented Jan 26, 2020

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ysunmi0427 commented Jan 27, 2020

Uh oh!

glemaitre Jan 27, 2020

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

ysunmi0427 commented Feb 1, 2020

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants